ADA081736 


Lm 

In-House  Report 
October  1979 


ft 


PROCEEDINGS  GF  THE  RADC  /  _ 

Ipectrum  estimationIvoRkshop 

V-  4 '  >  -???%  r 

Cj.  ' 

L\U<  * 


y 


;// '  -  jJ 


AFb.  A//r 


u. 


!!/  (Pjt/  V9j 


■  r  3<z>  " 


DT1C 

^cte 

MAR  1  2  1980 

.  A 


APPROVED  POR  PUBLIC  RELEASE;  DISTRIBUTION  UNLIMITED 


CL, 

.  o 

l,CJ> 

s,i  1 

— I 


K»  c'i 


c«: 

Swr"-' 


ROME  AIR  DEVELOPMENT  CENTER 

Air  Force  Systems  Command 

Griff iss  Air  Force  Base,  New  York  13441 


0 


80  3 


9 


ooi 


This  report  has  been  reviewed  by  the  RADC  Public  Affairs  Office  (PA)  and 
is  releasable  to  the  National  Technical  Information  Service  (NTIS) .  At  NTIS 
it  will  be  releasable  to  the  general  public,  including  foreign  nations. 

RADC-TR-79-63  has  been  reviewed  and  is  approved  for  publication. 


APPROVED 

PAUL  VAN  ETTEN 
Project  Engineer 


ArpR0VED! 

FRANK  J.  REHM 
Technical  Director 
Surveillance  Division 


FOR  THE  COMMANDER: 

d^OHN  P.  HUSS 

Acting  Chief,  Plans  Office 


Do  not.  return  this  copy.  Retain  or  destroy. 


SECURITY  CLASSIFICATION  OF  THIS  FACE  (Whmn  Data  Entarad) 


REPORT  DOCUMENTATION  PAGE 

READ  INSTRUCTIONS 

BEFORE  COMPLETING  FORM 

1.  REPORT  NUM3ER  2.  GOVT  ACCESSION  NO. 

RADC-TR-79-63 

3.  RECIPIENT’S  CATALOG  NUMBER 

4.  TITLE  (and  Subtltla) 

5.  TYPE  OF  REPORT  *  PERIOD  COVERED 

PROCEEDINGS  OF  THE  RADC  SPECTRUM  ESTIMATION 

In-House  Report 

WORKSHOP 

6.  PERFORMING  ORG.  REPORT  NUMBER 

N/A 

7.  AUTHOR/*,) 

•.  CONTRACT  OR  GRANT  NUMBER/ *J 

Multiple 

N/A 

9.  PERFORMING  ORGANIZATION  NAME  AND  ADDRESS 

y 

Rome  Air  Development  Center  (OCTS) 

Griffiss  AFB  NY  13441 

10.  PROGRAM  ELEMENT,  PROJECT.  TASK 
AREA  4  WORK  UNIT  NUMBERS 

P.E.  62702F 

11.  CONTROLLING  OFFICE  NAME  ANC  AOORESS 

12.  REPORT  DATE 

October  1979 

Same 

13.  NUMBER  OF  PAGES 

301 

14.  MONITORING  AGENCY  NAME  *  AOORESS/I/  dlltaront  /root  Controlling  Ofllea ) 

IS.  SECURITY  CLASS,  (ot  Ifilt  roport) 

Same 

UNCLASSIFIED  * 

15a.  DECLASSI  FICATION/ DOWNGRADING 
.  SCHEDULE 

N/A 

IS.  DISTRIBUTION  STATEMENT  (ot  Ulio  Report) 

Approved  for  public  release;  distribution  unlimited. 


17.  DISTRIBUTION  STATEMENT  (of  ft to  aba  tract  ontorod  In  Bioak  20,  II  (lit  far  ant  from  Roport) 

Same 


IS.  SUPPLEMENTARY  NOTES 

RADC  Project  Engineer:  Paul  Van  Etten  (OCTS) 


If.  K6Y  WORDS  (Continue  on  rovoree  old*  U  noeoeemy  end  identity  by  block  number) 

Signal  Processing 
Spectrum  Estimation 


y).  ABSTRACT  (Continue  a a  rovmrmo  aid*  il  necoeemry  end  Identity  by  block  number) 

This  is  the  second'  Spectrum  Estimation  Workshop  sponsored  by  RADC  to  provide  a 
means  for  key  researchers  in  the  field  to  describe  their  work  and  also  provide 
a  means  for  comparing  the  work  of  various  researchers  using  a  common  data  base 
for  representative  problems  of  importance  to  the  Air  Force.  This  report  is  a 
collection  of  papers  that  were  submitted  for  presentation  at  RADC's  Spectrum 
Estimation  Workshop  held  3,  4,  and  5  October  1979  at  Griff iss  Air  Forc6  Base, 
NY  13441.  The  papers  were  published  as  received  by  RADC  and  have  not  been 
edited.  Further,  publication  of  these  papers  does  not  represent  approval  or 


DD  ,5 


FORM 
JAN  73 


EDITION  OF  I  NOV  68  IS  OBSOLETE 


UNCLASSIFIED 

SECURITY  CLASSIFICATION, OF  THIS  PACE  (Whan  Data  Entarad) 


UNCLASSIFIED 


SECURITY  CLASSIFICATION  OF  THIS  PAGEfWJun  Data  Bntarad) 


endorsement  by  the  Rome  Air  Development  Center  or  the  U.S.  Air  .Force. 

DTI  C  Ab 

Proceedings  of  the  first  workshop  are  available  from  ,|A054650. 

Participants  were  also  presented  with  a  set  of  sample  problems  called  the 
Spectral  Estimation  Experiment.  The  object  of  this  experiment  was  to  establish 
a  basis  for  comparison  of  the  wide  variety  of  techniques  available  as  a  function 
of  selected  applications  on  both  real  and  artificial  daca  sets  representing 
specialized  problem  classes  which  are  of  interest  to  the  government.  The  common 
data  base  offers  several  additional  advantages. 

Four  problems  have  been  formulated  by  the  workshop  committee.  The^ fall 
generally  into  the  areas  of  radar,  pattern  recognition  and  system  ^ 
ident if icat ion . 

The  detailed  description  of  the  problem  and  the  solutions  as  determined  by  the 
many  different  algorithms  employed  will  be  published  separately. 


Accession  For 

*MTIS  URi&I 

Di>C  i‘AB 
Urrkt,av  !Uiced 
J'  i>  le.  ;t  S  o.n 


r 

.  . . 

•  •  ■  • - 

4  y  Codns 

ti 

h  and/or  I 
special  1 

JJ 

UNCLASSIFIED 


SECURITY  CLASSIFICATION  OF  THIS  PAGEOW»«m  Data  Bntarad) 


RADC's  SPECTRUM  ESTIMATION  WORKSHOP 
AGENDA 

3,  4,  an'd  5  October  1979 


Page 


Wednesday,  3  October 
0830-0930  Registration 

0935  Welcome 


0940 


Session  I 
1000 


1020 


Administrative  Announcements 
Clarence  Silfer  (Co-chairman) 

An  Introduction  to  the  Second  RADC  Spectral  1 

Estimation  Workshop,  Lester  A.  Gerhardt 
(Co-chairman) ,  Rensselaer  Polytechnic  Institute 

Dr.  Henry  Radoski,  AFOSR 

Minimum  Cross-Entropy  Spectral  Analysis — Introduction  7 
and  Examples,  John  E.  Shore  and  Rodney  W.  Johnson, 

Naval  Research  Laboratory 

Coffee  Break 


1100 

1120 


1140 

1200 

Session  II 
1330 


Optimal  Estimation  for  Bandlimited,  Time-Concentrated  23 
Signals,  D.  P.  Kolba  and  T.  W.  Parks,  Rice  University 

The  Use  of  Linear  Prediction  for  the  Interpolation  39 
and  Extrapolation  of  Missing  Data  and  Data  Gaps 
Prior  to  Spectral  Analysis,  Stephen  B.  Bowling  and 
Shu  Lai,  Massachusetts  Institute  of  Technology 
Lincoln  Laboratory 

A  New  Autoregressive  Spectrum  Analysis  Algorithm,  51 

Larry  Marple,  Advent  Systems,  Inc. 

Adjourn  for  lunch 

Dr.  Sherman  Karp,  DARPA 

ARMA  Spectral  Estimation:  An  Iterative  Procedure,  67 
James  A.  Cadzow,  Virginia  Polytechnic  Institute  and 
State  University 


i 


ARMA  Spectral  Estimation:  An  Efficient  Closed-Form  83 

Procedure,  James  A.  Cadzow,  Virginia  Polytechnic 
Institute  and  State  University 

1350  Extrapolating  Bandlimited  Signals  with  Noise  and  99 

Quantization,  Kenneth  Abend  and  Judith  R.  Platt, 

RCA  Government  Systems  Division 

1410  Accuracy  of  Spectral  Estimates  of  Band-Limited  117 

Signals,  William  B.  Gordon,  Naval  Research  Laboratory 

1430  Coffee  Break 

1500  Compensation  of  Autoregressive  Spectral  Estimates  for  127 

the  Presence  of  White  Observation  Noise,  Steven  Kay, 
Raytheon  Company 

1530  Order  Determination  for  Autoregressive  Spectral  139 

Estimation,  M.'  Kaveh  and  S.  P.  Bruzzone,  University 
of  Minnesota 

1600  Adjourn  for  the  day 

Thursday,  4  October 

Session  III  Dr.  Donald  Burlage,  -U.S.A.  R&D  Missile  Command 

0900  Difficulties  Present  in  Algorithms  for  Determining  147 

the  Rank  and  Proper  Poles  with  Prony’s  Method, 

Michael  L.  VanBlaricum,  ETI,  Incorporated 

0920  A  Unifying  Model  for  Spectral  Estimation,  Charles  157 

Byrne,  The  Catholic  University  of  America  and 
Raymond  F^t-zgerald,  Naval  Research  Laboratory 

0940  A  Comparison  of  the  Burg  and  the  Known-Auto-  163 

correlation  Autoregressive  Spectral  Analysis  of 
Complex  Sinusoidal  Signals  In  Additive  White  Noise,  ' 

Robert  W.  "Herring,  Communications  Research  Centre 

v-  w 

1000  Coffee«  Break 

1040  A  Two-Dimensional  Maximum  Entropy  Spectral  Estimator,  179 

Salim  Roucos  and  D.  G.  Childers,  College  of 
Engineering,  University  of  Florida 

1100  Spectral  Estimation  and  Signal  Extrapolation  in  One  195 

and  Two  Dimensions,  Anil  K.  Jain,  University  of 
California 


ii 


1120 


215 


1200 

Session  IV 
1330 

1350 

1410 

1430 

1500 

1520 

1540 


Antenna  Spacial  Pattern  Viewpoint  of  MEM,  MLM,  and 
Adaptive  Array  Resolution,  William  F.  Gabriel,  Naval 
Research  Laboratory 

Lunch 

Dr.  David  Kerr,  NRL 

Aperture  Sampling  Processing  for  Cround  Reflection  229 

Elevation  Multipath  Characterization,  James  E.  Evans 
and  David  F.  Sun,  MIT  Lincoln  Laboratory 

Multiple  Emitter  Location  and  Signal  Parameter 
Estimation,  Ralph  0.  Schmidt,  ESL,  Incorporated 

The  Maximum  Entropy  Spectral  Estimator  Used  as  a 
Radar  Doppler  Processor,  Simon  Haykin  and 
Hing  C.  Chan,  McMaster  University,  Ontario,  Canada 

Coffee  Break 

Applications  for  MESA  and  the  Prediction  Error 
Filter,  William  R.  King,  King  Research 

The  Maximum  Entropy  Method  Applied  to  Radar  Adaptive.  289 
Doppler  Filtering,  J.  H.  Sawyers,  Hughes  Aircraft 
Company 

Complex  Maximum  Power  Spectral  Analysis  of  AFGL 
Magnetometer  Data,  Fougere,  AFGL.  (Paper  not  published) 


243 

259 

'  273 


Friday,  5  October 

0900  A  Comparison  of  Solutions  to  the  Workshop  Problems, 

Dr.  Lester  A.  Gerhardt  (Co-chairman),  Rensselaer 
Polytechnic  Institute 

1030  Coffee  Break 


1100  Workshop  Panel  Activity 

1230  Adjourn  the  Workshop 


iii 


PREFACE 


This  is  fhe  second  Spectrum  Estimation  Workshop  sponsored  by  RADC  to 
provide  a  means  for  key  researchers  in  the  field  to  describe  their  work  and 
also  provide  a  means  for  comparing  the  work  of  various  researchers  using  a 
common  data  base  for  representative  problems  of  importance  to  the  Air  Force. 
This  report  is  a  collection  of  papers  that  were  submitted  for  presentation  at 
RADC's  Spectrum  Estimation  Workshop  held  3,  4,  and  5  October  1979  at  Griffiss 
Air  Force.  Base,  NY  13441.  The  papers  were  published  as  received  by  RADC  and 
have  not  been  edited.  Further,  publication  of  these  papers  doss  not  represent 
approval  or  endorsement  by  the  Rome  Air  Development  Center  or  the  U.  S.  Air 
Force. 

\  ~  > 

Proceedings  of  the  first  workshop  are  available  from  DDC,  //A054650. 

Participants  were  also  presented  with  a  set  of  sample  problems  called 
the  Spectral  Estimation  Experiment.  The  object  of  this  experiment  was  to 
establish  a  basis  for  comparison  of  the  wide  variety  of  techniques  available 
as  a  function  of  selected  applications  on  both  real  and  artificial  data  sets 
representing  specialized  problem  classes  which  are  of  interest  to  the 
government.  The  common  data  base  offers  several  additional  advantages. 

Four  problems  have  been  formulated  by  the  workshop  committee.  They  fall 
generally  into  the  areas  of  radar,  pattern  recognition  and  system  identifica¬ 
tion. 

The  detailed  description  of  the  problem  and  the  solutions  as  determined 
by  the  many  different  algorithms  employed  will  be  published  separately. 


CPECTRUM  ESTIMATION  WORKSHOP  COMMITTEE 

1.  Russel  Brown  (RADC/OCTS) 

2.  Edward  Christopher  (RADC/OCTS) 

3.  Lester  Gerhardt  (RPI/Co-chairman) 

4.  Clarence  Silfer  (RADC/OCTS /Co-chairman) 

5.  Paul  Van  Etten  (RADC/OCTS) 

6.  Haywood  Webb  (RADC/ISCP) 


v 


AN  INTRODUCTION  TO  THE  SECOND  RADC 
SPECTRAL  ESTIMATION  WORKSHOP 


LESTER  A.  GERHARDT 

I 

Professor  and.  Chairman 

Electrical  and  Systems  Engineering  Department 
Rensselaer  Polytechnic  Institute 
Troy,  NY  12l8l 


Introduction 


As  Co-Chairman  of  this  Second  RADC  Spectral  Estimation  Workshop,  it  is 
a  honor  and  pleasure  for  me  to  welcome  all  of  you  to  this  gathering.  Being 
Co-Chairman  of  the  First  Workshop  held  last  year,  I  have  been  privileged  to 
watch  this  field  of  Spectral  Estimation  as  it  has  emerged  from  its  newly 
founded  embryonic  stage  in  the  first  workshop  to  one  of  increased  maturity 
this  year.  Many  previously  diverse  approacnes  and  fields  have  coalesced  and 
common  mathematical  tools  as  well  as  problems  identified.  (This  has  been 
further  aided  by  other  workshops  such  as  the  one  scheduled  for  January  1980 
at  Arden  House,  Harriman,  NY  sponsored  jointly  by  the  IEEE  and  Geophysical 
Society.)  Still  some  major  problems  remain,  such  as  the  determination  of 
model  order,  and  no  single  technique  has  been  clearly  identified,  as  being 
superior.  However,  new  techniques  and  algorithms  are  still  sought,  and  the 
current  Workshop  offers  you  some  of  them.  It  also  presents  new  and  a  broader 
range  of  application  papers  with  concrete  results.  Finally,  another  com¬ 
parison  of  methods  will  be  made  using  a  set  of  representative  problems 
utilizing  a  common  data  base. 

It  should  be  noted  that  the  First  Workshop  was  for  many  a  jumping  off 
point  to  the  field  and  served  as  a  means  of  focusing  attention  to  this  class 
of  spectral  estimation  problems.  Since  that  time,  many  of  the  authors  have 
been  engaged  in  actively  sponsored  government,  industrial  and  academic 
research,  hardware  systems  using  MEM  are  being  reduced  to  hardware/ firmware, 
and  published  papers  have  substantially  increased  relating  to  this  subject 
area.  Moreover,  there  have  been  other  publications  such  as'  the  IEEE  Press 
publication  of  Modern  Spectrum  analysis  methods,  which  have  helped  in 
identifying  the  field  as  significant.  The  Workshop  Committee  would  like  to 
think  that  our  first  get  together  aided  in  the  increased  interest  shown  in 
the  field  over  the  last  year  and  take  this  opportunity  to  thank  you  for  your 
involvement  and  contribution  towards  that  end. 

Last  year,  my  paper  included  a  brief  mathematical  development  of  each  of 
the  major  techniques.  This  year  this  seems  unnecessary  and  it  should  be 
sufficient  to  summarize  the  papers  grouping  some  common  aspects  and  identi¬ 
fying  emerging  trends,  leaving  the  detailed  accomplishments  to  the  authors 
themselves . 


1 


Summary  of  Technical  Pa.pers 


Twenty-one  papers  form  the  basis  of  the  technical  sessions  that  follow. 
There  are  eight  papers  authored  by  personnel  from  university  or  university 
related  organizations ,  seven  from  industry,  five  from  government,  and  one 
jointly  offered  from  university  and  government.  The  majority  of  government 
papers  originate  from  the  Naval  Research  Laboratories  (NRL) .  Compared  to 
last  year,  this  represents  an  increase  in  the  percentage  of  papers  origin¬ 
ating  from  industry,  perhaps  indicative  of  a  trend  of  the  spectral  estimation 
techniques  towards  practical  implementation. 

As  one  reviews  the  papers  in  detail,  some  major  common  threads  or 
themes  appear,  and  some  general  observations  emerge. 

The  technical  content  of  the  papers  falls  broadly  into  the  two  cate¬ 
gories  of  theory  and  application,  with  many  papers  incorporating  both.  At 
the  theoretical  end,  the  first  paper  by  Shore  and  Johnson  is  an  excellent 
treatment  of  Cross  Entropy  Spectral  Analysis.  This  approach  is  useful  to 
estimate  power  spectra  given  a  priori  estimates  of  the  spectra  and  new  infor¬ 
mation  in  the  form  of  autocorrelation  function  samples,  and  reduces  to  maxi¬ 
mum  entropy  spectral  analysis  in  special  cases.  It  should  help  expose  un¬ 
familiar  users  to  the  effectiveness  and  applicability  of  this  approach.  The 
next  two  papers  are  concerned  with  treating  a  limited  number  of  discrete 
time  domain  samples  available.  The  paper  by  Kolba  and  Park  develops  an 
implementation  of  a  recursive  estimation  procedure  by  minimizing  the  maxi¬ 
mum  error  given  a  limited  number  of  time  samples  but  with  a  priori  knowledge 
of  the  bandwidth  and  time  duration  of  the  signal  (time  concentrated  signals); 
whereas  the  paper  by  Bowling  and  Lai  describes  a  linear  prediction  method 
to  interpolate  and  extrapolate  missing  data  using  a  spectrally  consistent 
estimate  -  this  done  prior  to  spectral  analysis.  The  latter  paper  considers 
applications  to  radar  for  both  real  and  simulated  data.  The  next  paper  by 
Marple,  also  is  concerned  with  radar  as  the  area  of  application,  but  is  in 
major  part  a  theoretical  treatment  of  a  new  autoregressive  algorithm  for 
spectral  estimation  using  a  least  squares  approach  which  all  but  eliminates 
line  splitting,  a  problem  cited  several  times  at  last  year’s  Workshop.  Jim 
Cadzow,  one  of  several  principle  researchers  in  this  field  in  recent  years, 
next  offers  an  ARMA  autocorrelation  estimator  method  (AEM)  for  rational  spectral 
density  estimation  which  permits  the  use  of  poles  and  zeroes  yielding  a  more 
robust  procedure.  Following  this  is  the  paper  by  Abend  and  Platt  which  ex¬ 
tends  Codzow 's  method  (presented  at  uhe  1978  Workshop)  using  an  iterative 
steepest  descent  method  to  invert  the  sometimes  ill-conditioned  matrix,  and 
treats  the  problem  of  noise  and  quantization. 

The  paper  by  Gordon  concentrates  on  the  topic  of  accuracy  of  spectral 
estimates  and  particularly  discusses  the  effects  of  noise.  Considering  a 
bandlimited  signal  in  additive  white  noise,  Gordon  analyzes  the  mean  squared 
error  of  the  linear  spectral  estimate  as  a  function  of  the  time  bandwidth 


2 


product  and  signal  to  noise  ratio.  The  next  paper  by  Kay  is  also  directed  at 
noise  problems  and  takes  the  approach  of  compensation  of  autoregressive 
spectral  estimates  when  imbedded  in  white  noise.  Kay  assumes  the  noise 
variance  is  known  and  with  that  compares  his  compensation  technique  with  the 
ARMA  approach  to  handling  noise. 

The  last  two  papers  in  the  basic  theory  group  are  directed  at  the 
classical  problem  of  order  determination.  Kaveh  and  Bruzzone  use  Aka ike ' s 
Information  Criterion  (AIC)  to  determine  the  order  of  the  autoregressive 
spectral  estimate,  while  Van  Blaricum  offers  the  eigenvalue  method  and  the 
HFTI  method  as  means  to  determine  the  order  (or  alternatively  rank  and  poles) 
associated  with  Prony's  method.  This  still  remains  a  difficult  and  open 
problem  with  no  overall  solution  to  the  selection  of  optimum  cutoff  or  to 
how  the  effects  of  noise  may  be  handled. 

The  next  two  papers  are  almost  exclusively  directed  at  developing  a 
unifying  theoretical  base  or  model,  or.  to  a  comparison  of  techniques.  The 
paper  by  Byrne  and  Fitzgerald  stress  comparison  of  techniques  including  the 
implicit  models  of  Cadzow  and  Figueiredo  among  others  and  attempts  to 
establish  that  these  models  are  related  and  in  fact  covered  by  their  unifying 
model,  which  serves  as  both  a  minimum  energy  extrapolation  and  least  mean 
square  approximation  of  the  spectrum.  It  serves  to  establish  the  trend  of 
searching  for  commonality  in  approaches .  This  is  further  enhanced  by 
Herring's  paper  which  compares  Burg's  method  with  known  autocorrelation 
autoregressive  spectral  analysis  in  white  noise. 

The  two  papers  that  follow,  the  first  by  Childers  and  Roucos,  and  the 
second  by  Jain,  both  deal  with  2-D  spectral  estimation  and  continue  a  theme 
begun  at  the  first  Workshop.  The  first  paper  develops  a  2-D  estimation 
algorithm  whereas  Jain's  paper  deals  with  one  and  two  dimensional  estimation. 
Jain  uses  a  minimum  norm  least  square  (MNLS)  formulation  and  treats  the 
problem  appropriately  as  iterative  matrix  inversive  using  a  gradient  approach. 

The  remaining  seven  papers  are  primarily  applications  oriented,  although 
include  significant  theoretical  developments  as  well.  The  first  three  of 
this  set  of  application  papers  focus  on  the  use  of  spatial  information. 

Gabriel  compares  MEM  and  MIM  spectral  estimation  methods  to  their  adaptive 
array  counterparts  popular  in  the  adaptive  antenna  array  field,  both  being 
recognized  as  a  matrix  inversion  problem.  This  similarity  reinforces  points 
made  in  my  introductory  talk  last  year,  and  serves  to  bring  these  fields 
closer  together.  The  paper  by  Sun  and  Evans  is  directed  at  multipath,  and 
suggests  the  use  of  aperture  sampling  to  improve  angular  resolution  and 
tracking.  It  also  applies  the  high  resolution  MLM  and  MEM  methods  to 
spatial  data.  Finally,  in  this  category,  is  the  paper  by  Schmidt  which 
deals  with  multiple  emitter  location  and  again  deals  with  a  spatially  distri¬ 
buted  application. 


7 


The  last  four  applications  are  quite  diversified  and  demonstrate  the 
increased  breadth  of  spectral  estimation.  Last  year  for  example,  the 
applications  were  almost  solely  radar  oriented.  This  year  aside  from  the 
spatially  distributed  applications  above,  the  paper  by  Haykin  and  Chan  uses 
maximum  entropy  estimation  to  function  as  a  Doppler  processor.  They  show 
that  MEM  is  only  slightly  suboptimal  to  conventional  processors  using  the 
DFT  for  additive  white  noise,  but  for  additive  clutter  (narrow  spectral  band¬ 
width)  MEM  is  much  better  for  low  Doppler  targets.  The  paper  by  King 
discusses  several  applications  for  MESA  and  his  prediction  error  filter  in¬ 
cluding  clutter  reduction,  signal  detection,  etc.  Sawyers  paper  applies  the 
MEM  to  adaptive  digital  filtering  directly  tying  these  two  fields,  and 
finally  Fougere's  paper  deals  with  estimating  the  dominant  frequencies  and 
polarization  patterns  of  magnetic  pulsation  events  using  both  linear  and 
nonlinear  methods  which  are  also  claimed  to  eliminate  line  splitting  and 
shifting. 


General  Comments 

From  the  previous  descriptions,  it  should  be  apparent  that  there  con¬ 
tinues  to  be  new  theoretical  developments,  as  well  as  advances  on  previously 
developed  techniques  to  improve  accuracy,  reduce  the  effect  of  noise,  and 
yield  'improved  computational  capability.  Two  main  problems  of  order  deter¬ 
mination  and  line  splitting  are  addressed  again  this  year.  However,  in  at 
least  the  two  papers  by  Fougere  and  Marple,  the  line  splitting  problem  is 
fairly  well  r,esolved  whereas  the  establishing  of  the  model  order  needed 
(degrees  of  freedom,  poles,  etc.)  as  exemplified  in  the  papers  by 
Van  Blaricum  and  Kaveh  remains  a  substantial  difficulty.  A  mix  of  linear 
and  nonlinear  techniques  continue  also,  with  no  one  method  cited  as  clearly 
superior.  Work  on  extensions  to  two  dimensional  processing  remains  active 
and  of  substantial  interest. 

In  an  effort  to  provide  some  motivation  in  other  directions,  (or  stimu¬ 
late  controversy  or  opposition  as  the  case  may  be),  let  me  say  that  I  feel 
there  is  still  too  much  emphasis  on  treating  additive  white  noise  exclusively. 
This  is  exemplified  in  papers  by  Gordon,  Kay,  Herring  among  others.  Noting 
the  substantial  differences  in  performance  of  the  MEM  obtained  in  the  paper 
by  Haykin  and  Chan  for  white  noise  vs.  narrowband  noise,  certainly  some  work 
is  needed  to  explore  the  effects  of  non-spectrally  flat  noise.  There  also 
remains  a  continued  special  concentration  of  mean  squared  error  criterion 
such  as  in  Gordon,  Marple,  and  Jain.  Although  very  effective  perhaps  other 
criteria  warrant  study 

The  reader  should  carefully  observe  that  representation  continues  from 
not  only  government;,  industry,  and  the  university,  but  also  from  electrical 
engineering,  geophysics,  etc.  It  is  rewarding  to  see,  nonetheless,  that 
several  papers,  more  than  in  the  past  year,  now  are  strongly  interrelating 
the  fundamental  problems  of  spectral  estimation  with  those  of  other  fields, 


a  suggestion  made  by  myself  and  others  last  year.  For  example,  the  papers 
by  Gabriel,  Bowling  and  Lai,  Abend  and  Platt,  and  Kolba  and  Park  all  cover 
to  one  extent  or  another  the  relation  between  spectral  estimation  and 
adaptive  techniques,  including  gradient  methods  for  iteratively  inverting 
a  matrix  etc.  In  many  more,  there  is  a  solid  appreciation  of  the  basic 
nature  of  the  matrix  inversion  problem  as  it  dominates  the  field  of  spectral 
estimation. 

Insofar  as  applications  papers  are  concerned,  there  is  an  integral  mix 
of  theory  and  applications,  a  broader  diversification  of  applications 
than  before,  and  more  concrete  results.  The  effectiveness  of  the  techniques, 
will  best  be  measured  in  the  comparison  of  results  on  the  common  data  sets 
provided  to  all  participants,  a  discussion  better  left  to  another  day. 

In  conclusion,  let  me  restate  my  welcome  to  the  authors  including  those 
returning  for  a  repeat  performance  and  those  here  for  the  first  time,  and  to 
the  general  audience.  I  trust  the  forthcoming  technical  sessions  and  sub¬ 
sequent  comparative  problem  sessions  will  be  as  fruitful  and  rewarding  to 
you  as  it  has  been  to  the  committee  members  and  myself  in  helping  to  prepare 
them. 


MINIMUM  CROSS-ENTROPY  SPECTRAL  ANALYSIS  - 

INTRODUCTION  AND  EXAMPLES 


JOHN  E.  SHORE 
RODNEY  W.  JOHNSON 

Naval  Research  Laboratory 
Washington,  D.C.  20375 

Abstract 

The  principle  of  minimum  cross  entropy  (minimum  directed  divergence)  is 
summarized,  discussed,  and  applied  to  the  classical  problem  of  estimating 
power  spectra  given  samples  of  the  autocorrelation  function.  This  new 
approach  reduces  to  maximum-entropy  spectral  analysis  (MESA)  in  certain 
special  cases  but,  in  contrast  to  MESA,  permits  use  of  a  prior  estimate  of 
the  power  spectrum.  Examples  of  applications  are  given. 

1 .  Introduction 

Work  reported  in  [l]-[2]  showed  that  the  principle  of  minimum 
cross-entropy  (minimum  directed  divergence)  provides  a  correct,  general 
method  of  inductive  inference  in  terms  of  continuous  probability  densities 
when  given  a  prior  density  and  information  about  the  "true"  density  in  the 
form  of  expected  values.  Subsequent  work  [3]  showed  how  cross-entropy 
minimization  can  be  used  to  estimate  power  spectra  when  given  a  prior 
estimate  of  the  spectrum  and  new  information  in  the  form  of  autocorrelation 
function  samples.  This  new  technique  reduces  to  maximum  entropy  spectral 
analysis  [4]-[5]  in  certain  special  cases.  In  this  paper  we  summarize  the 
new  technique  and  we  give  examples  of  its  application. 

2.  Cross-Entropy  Minimization 

Let  x  denote  a  single  state  of  some  system  that  has  a  set  D  of  possible 
system  states  and  a  probability  density  q^(&)  of  states.  Let  be  the  set 
of  all  probability  densities  q  on  D  such  that  q(x)?>0  for  x€D  and 

f  dx  q(x)  =  1  .  (1) 

i  -  A 

We  assume  that  the  existence  of  q ' 13  is  known  but  that  qT  itself  is 
unknown.  The  density  q^  is  sometimes  known  as  a  "true"  density. 

Suppose  p«£  is  a  prior  density  that  io  our  current  estimate  of  q*, 
and  suppose  we  gain  new  information  about  q^  in  the  form  oi:  a  set  of 


(2) 


expected  values 


f  dx  qt(x)gr(x)  =  <gr>  *  gr 


for  a  known  set  of  bounded  functions  gr(x)  and  numbers  gr,  r  * 

Now,  because  the  constraints  (2)  not  determine  completely,  they  are 
satisfied  not  only  by  but  by  sonu.  subset  of  densities  Sj&  •  Which 

single  density  should  we  choose  from  ti.is  subset  to  be  our  new  estimate  of 
qt ,  and  how  should  we  use  the  prior  p  and  the  new  information  (2)  in  making 
this  choice? 


The  solution  to  this  inference  problem  is  obtained  by  minimizing  a 
functional  H(q,p)  called  cross-entropy, 

H(q,p)  ■  (  dx  q(x) log(q(x)/p(x) )  .  (3) 

Specifically,  of  all  the  densities  q'fi  Jt  that  satisfy  the  constraints  (2), 
we  choose  the  one  with  the  smallest  cross-entropy  H(q',p)  with  respect  to 
the  prior  p.  Stated  differently,  the  posterior  density  q  satisfies 

H(q , p)  ■  min  H(q' ,p)  , 

a  ^ 

where  JtfSJD  comprises  all  of  the  densities  that  satisfy  the  constraints  (2). 


Mathematically,  the  solution  is  obtained  using  the  method  of  Lagrangian 
multipliers  and  standard  techniques  from  the  calculus  of  variations.  The 
minimization  condition  is 

log(q(x) /p(x))  +  1  +  A0  +  y3rgr(x)  -  0  ,  (4) 

where  the  y3r  are  Lagrangian  multipliers  corresponding  to  the  constraints 
(2),  and  where  A0  is  a  Lagrangian  multiplier  corresponding  to  the 
normalization  constraint  (1).  The  solution  of  (4)  is 

q(x)  «  p(x)exp(-A  -  ]£!r /flrfSr^))  *  (5) 

where  A  ■  A*  +  !•  It  i.s  convenient  to  write  (5)  in  the  form 

q(x)  -  Z_1p(x)exp(- £r  ^gr(x))  ,  (6) 


where  2  is  the  "partition  function", 


exp 


(A  )  -  j^dx  p(x)exp(-  2rydrgr(x)) 


The  values  of  the  multipliers  /3r  are  determined 
values  gr  in  (2).  One  can  express  the  posterior 
the  values  gr  by  solving  the  equations 


(7) 

by  the  known  expectation 
q  directly  in  terms  of 


8 


(8) 


i i  [  -, -  t.fl; J 


J5rlog(Z) 

for  the  ^3r,  or  by  substituting  (6)  into  the  constraint  equations  (2)  and 
solving  for  the  fi>r.  Such  solutions  are  often  difficult  or  impossible  to 
obtain  analytically,  but  one  can  obtain  them  computationally  in  general 
[ 1 ,  Appendix  B]  ,  [6] . 

The  principle  of  minimum  cross-entropy  was  first  proposed  by  Kullback 
[7] ,  who  called  it  a  principle  of  minimum  directed  divergence  or  minimum 
discrimination  information.  The  term  cross-entropy  is  due  to  Good  [8]. 
Crosc-entropy  can  be  characterized  axiomatically  [9]  in  terms  of  properties 
that  are  desirable  for  an  information  measure  [9],  [10],  and  it  can  be 
argued  [11]  that  cross-entropy  measures  the  amount  of  information  necessary 
to  change  a  prior  p  into  the  posterior  q.  The  principle  of  cross-entropy 
minimization  then  follows  intuitively.  This  justification  is  somewhat 
indirect  —  it  is  based  on  a  formal  description  of  what  is  required  of  an 
information  measure  rather  than  on  a  formal  description  of  what  is  required 
of  a  method  for  taking  new  information  into  account. 

Recently,  we  obtained  a  stonger  justification  [l]-[2].  Our  approach 
was  to  formalize  the  requirements  of  inductive  inference  directly  in  terms 
of  four  consistency  axioms  that  make  no  reference  to  information  measures  or 
properties  of  information  measures.  All  of  the  axioms  are  based  on  a  single 
fundamental  principle:  If  a  problem  can  be  solved  in  more  than  one  way,  the 
results  should  be  co  'stent.  We  were  then  able  to  prove  that  the  principle 
of  minimum  croas-en  opy  provides  a  correct,  general  method  of  inductive 
inference  in  the  following  sense:  Given  a  prior  density  and  new  information 
in  the  form  of  constraints  on  expected  values,  there  is  only  one  posterior 
density  satisfying  these  constraints  that  can  be  chosen  in  a  manner  that 
satisfies  the  axioms;  this  unique  posterior  can  be  obtained  by  minimizing 
cross-entropy . 

The  principle  of  minimum  cross-entropy  is  a  generalization  of  the 
principle  of  maximum  entropy  [12] -[13].  When  the  prior  density  is  uniform, 
cross-entropy  minimization  reduces  to  entropy  maximization. 

3.  Minimum- Cross -Entropy  Probability  Densities  for 
Stochastic  Signals  Given  Expected  Spectral  Powers 

Consider  time-domain  signals  of  the  foirn 

s(t)  ■  a^cosCej^t)  +  bj^oinlw^t)  , 

k=l 


(9) 


with  non-zero  *»>k  that  need  not  be  uniformly  spaced.  These  are 
discrete-spectrum,  band-limited  signals  without  DC  components.  (The 
assumption  of  no  DC  term,  which  is  reasonable  for  many  signal  processing 
applications,  is  made  for  mathematical  convenience.)  The  power  at  each 
frequency  is  given  by  the  variables  xk» 


If  we  consider  the  xk  to  be  random  variables,  we  may  describe  a  stochastic 
signal  in  terms  of  a  joint  probability  density  q(ij),  where  we  write  for 
xl  ,X2 ,  •  •  •  »Xn.  Instead  of  constantly  referring  to  q(x)  as  the  spectral 
power  probability  density  of  a  stochastic  signal,  we  will  informally  refer 
to  q(x)  as  a  "signal." 


Now  consider  the  problem  of  choosing  qlx)  when  we  know  che  expected 
power  Pk  at  each  frequency 


pk  *  <xk>  “  (  dx  xkq^) 

Je¬ 


an 


where  dx;  ■  dxjdx2.  •  .dxn.  To  apply  the  principle  of  minimum 
cross-entropy ,  we  need  a  prior  density  p(3j)  to  represent  our  atate  of 
knowledge  before  we  learn  even  (11).  Since  in  any  real  situation  there  will 


be  a  physical  limit  on  the  magitude  of  the  xk,  we  assume  that  the  domain 
of  x;  is  bounded.  We  may  therefore  use  a  uniform  prior  density.  For  a  more 
detailed  analysis  of  this  assumption,  see  [3]. 


We  choose  q(x)  by  minimizing  cross-entropy  subject  to  the  constraints 
(1)  and  (11).  The  result  (see  (5))  is 

q(x)  -  A  exp(-  ^k/3kxk)  » 

where  the  /3|c  are  the  Lagrangian  multipliers  corresponding  to  (11),  and 
where  the  uniform  prior  and  the  Lagrangian  multiplier  corresponding  to  (1) 
have  been  absorbed  into  the  constant  A, 

A“*  “  ^  dx][dx2 . . .  dxn  exp(-- ^JT  k/?kxk)  •  (12) 

Provided  that  the  Pk  are  much  less  than  the  maximum  values  of  the  xk*  we 
may  use  integration  limits  (0,*o)  in  (12);  this  leads  to  A  *  n- 

In  terms  of  the  multipliers^ the  powei  constraints  (11)  become 

pk  "  Plk'-A n  f  dxkxkexp(-^kxk) 77 

"  1-jSk  • 


10 


The  posterior  q(x)  is  therefore 

q(x)  =  exp(-xk/Pk)  . 

ksf 

Thus,  q(g)  is  a  multivariate  exponential  -  each  spectral  power  xk  is 

exponentially  distributed  with  mean  Pk. 

4.  Minimum-Cross -Entropy  Power  Spectra  Given  Autocorrelation 
Information  and  a  Prior  Estimate  of  the  Power  Spectrum 


(13) 


M 

i  signal  qT(x)  have  a  power  spectrum  G(f)  and 
:ion  R(t).  Suppose  we  obtain  information  ».boi 


Let  some  unknown 

autocorrelation  function  R(t).  Suppose  we  obtain  information  ".bout  G  in  the 
form  of  a  set  of  samples  of  the  autocorrelation  function  R(tr), 


Rr  -  R(t 


f  w 

r>  -  c 

J-W 


df  G(f)exp(2*ritrf) 


(14) 


r  ®  l,...,m.  We  do  not  assume  that  the  tr  are  equally  spaced.  If  the 
frequency  spectrum  is  discrete,  as  we  have  assumed  in  (9),  we  can  express 
G(f)  as 

G(f) 


it 


]TGUi(f-f„)  , 

ka-n 

where  fk  ■  -f_k,  Gk  *=  G_k  “  G(fk),  and  Gq  *  0.  Then  (14)  becomes 

n 

R r  *  X  Gkexp(2«itrfk)  , 

k*-n 

which  we  prefer  to  express  in  the  non-complex  form 

h 

Rr  *  lGkcrk  , 

k=t 

where 

crk  "  2  cos(2»rtrfk) 

Since  the  Gk  satisfy 

Gk  “  <xk>  ■  J dx  xkq*(x)  » 

£ 

we  can  rewrite  (15)  as 

Rr  "  ^  <*x  C£k  xkcrk)q+(x)  . 


(15) 


(16) 


(17) 


(18) 


This  has  the  form  of  known  expected  values  of  the  unknoim  density  qT(x),  and 
we  may  therefore  use  the  principle  of  minimum  cross-entropy  to  infer  an 
estimate  of  q*.  In  terms  of  the  general  form  (2),  the  functions  gr  are 
gr  “  2kxkcrk*  This  minimum  cross-entropy  problem  differs  from  the 
one  discussed  *n  Section  3  in  that  the  Section  3  problem  assumed  knowledge 


11 


■  'fctt  Imfr.  I  ■  dtm  ■  rf  k Ak iMA i 


of  the  expected  spectral  powers  in  the  form  (17),  whereas  in  this  problem  we 
have  only  the  form  (18).  Since  typically  m<n,  knowledge  of  (18)  provides 
less  information  than  does  (17). 


Now  suppose  we  obtain  the  autocorrelation  information  (18)  when  we 
air' ady  have  an  estimate  Pk  of  the  power  spectrum  Gk  (17).  We  reflect 
this  prior  information  as  a  prior  density  with  the  exponential  form  (13) 

n 

p (x)  =  JT  (1/Pk)  exp(-xk/Pk)  ,  (19) 

k*J 

which  itself  is  the  minimum  cross-entropy  density,  with  respect  to  a  uniform 
prior,  given  knowledge  of  the  expected  spectral  powers  Pk. 

We  then  solve  the  problem  of  estimating  Gk,  given  a  prior  estimate 
Pk  and  new  autocorrelation  information  (18),  by  assuming  the  prior  density 
(19)  and  minimizing  cross-entropy  subject  to  the  constraints  (18)  and  (1). 
The  result  is 

m  n 

q(x)  =  p ( x)  exp(- A  -  ^  fir  xkcrk)  ,  (20) 

k-1 

where  ihe  are  m  Lagrangian  multipliers  corresponding  to  the 
autocorrelation  constraints  (18).  For  convenience,  we  define 

m 

uk  ■  ^rcrk  »  (21) 

rsl 

so  that  (20)  can  be  written  as 

q(x)  “  p(x)  exp(-  A  -  £  k  uk>tk) 

■  77  (1/Pk)  exp(-(uk+l/Pjc)xk)  .  (22) 

k 

Since  A's  value  must  be  such  that  q(x)  satisfies  the  normalization 
constraint  (1),  (22)  becomes 

n 

q(x)  ■  77(uk+(/pk^  exp(-(uk+l/Pk)xk)  .  (23) 

ks| 

For  our  posterior  estimate  Qp.  of  the  power  spectrum,  we  use  the  density 
(23)  to  compute  =  ^xk}  ■  l/(uk+l/Pk),  or 


1 

Ti/pk)  +  X  r  ^rcrk 


(24) 


where  the  multipliers  /3r  are  determined  \y  the  requirement  that  the 
satisfy  the  autocorrelation  constraints  (15) 


Rr 


^  Qkcrk 

k*1 


(25) 


12 


The  minimum  cross-entropy  result  (24)-(25)  can  also  be  derived  by  arguments 
concerning  the  cross-entropy  between  the  input  and  output  of  linear 
filters  [3] . 

Suppose  that  the  prior  estimates  are  uniform  (P^  =  P),  and 
suppose  that  one  of  the  autocorrelation  samples,  say  Rj,  is  for  zero  lag 
(tj  *  0).  Then  (24)  reduces  to 


where  the  constant  1/P  has  been  absorbed  into  the  multiplier  J&i  since 
clk  “  2  holds  for  all  k  (see  (16)).  This  is  identical  to  the  standard 
result  for  maximum  entropy  spectral  analysis  (MESA),  except  that  the  MESA 
equations  are  usually  expressed  in  complex  form  (see  [5],  p.  9,  for 
example).  Therefore,  (26)  is  also  identical  to  the  results  obtained  by 
autoregressive,  linear  predictive,  and  minimum  least  squares  techniques  [5], 
[14].  The  reduction  of  (24)  to  (26)  reflects  the  general  equivalence  of 
cross-entropy  minimization  and  entropy  maximization  in  the  case  of  uniform 
priors.  When  the  prior  spectral  power  estimate  is  not  uniform,  MESA 
and  cross-entropy  minimization  (24)  give  different  results.  For  a  more 
detailed  comparison,  see  [3]. 


5.  Examples 


In  this  section  we  present  some  numerical  examples  in  which 
convention'!  maximum-entropy  spectral  estimates  are  compared  with 
minimum-cros-i -entropy  estimates  that  take  into  account  prior  information 
about  the  spectrum.  In  each  example,  autocorrelations  at  a  small  number  of 
equally  spaced  lags  were  computed  from  an  assumed  "true"  spectrum;  then 
maximum-entropy  and  minimum-cross-entropy  spectra  were  computed  from  the 
autocorrelations  and  plotted. 


For  the  first  example,  the  original  spectrum  is  the  sum  of  a 
"background"  term,  approximating  1/f  noise,  and  a  "signal"  term 
corresponding  to  a  sinusoidal  signal  at  a  fixed  frequency.  The  background 
term  is  given  by 

=  .01/fk  ,  (k  *  1 ,  ...  ,  50) 

for  fifty  equally  spaced  frequencies  f^  =  (.005,  .015,  ...  ,  .495)  between 
0  and  0.5  (which  is  the  Nyquist  frequency:  we  take  the  spacing  between 
autocorrelation  lags  to  be  unity).  The  signal  term  is  given  by 


(fk  =  .105) 
otherwise 


' &Br~. 


i,  asa 


The  sum  is  shown  in  Fig.  1;  the  first  few  corresponding  autocorrelations 
Rr  are  as  follows: 

tr  -  0  1  2  3  4  5 

Rr  -  15.7511  11.6149  7.8699  4.5411  2.0145  1.1413  . 

The  maximum  entropy  spectrum  computed  from  these  six  autocorrelations  is 
shown  in  Fig.  2.  For  the  minimum-cross-entropy  calculation,  the  background 
term  G^b)  has  been  used  as  the  prior  spectral  estimate;  the  resulting 
posterior  is  shown  in  Fig.  3.  As  one  might  expect,  the  1/f  background  is 
considerably  better  estimated  in  Fig.  3  than  in  Fig.  2.  More  important, 
however,  there  is  a  clearly  discernible  peak  in  Fig.  3  corresponding  to  the 
sinusoidal  signal  at  frequency  .105;  no  such  peak  is  evident  in  Fig.  2. 

„  For  the  second  example,  spectral  powers  are  shown  at  the  same 
frequencies  as  for  the  first,  autocorrelations  are  computed  for  the  same 
lags,  and  the  original  spectrum  is  again  the  sum  of  a  "background"  term 
G^k)  and  a  "signal"  term  G^8^.  In  this  example,  the  background  consists 

sinusoid  at  frequency  .215: 


peak  at  frequency  .165: 


autocorrelations  are 

3  4  5 

-3.2248  0.2032  2.6900 

and  the  maximum-entropy  spectrum  is  shown  in  Fig.  5.  For  the 
minimum-cross-entropy  calculation,  the  background  term  G^k)  has  again  been 
taken  as  a  prior  spectral  estimate.  The  posterior  estimate  is  shown  in  Fig. 
6.,  The  information  in  the  prior  has  permitted  the  resolution  of  the 
•'expected"  peak  at  frequency  .215  from  the  "unexpected"  peak  at  frequency 
.165.  In  the  maximum-entropy  estimate,  bv  contrast,  the  two  peaks  are 
coalesced  into  a  single  peak  at  about  the  center  frequency,  .190. 

References 


cff  white  noise  plus  a  peak  corresponding  to  a 

G(b)  .  <  1-02  (Ik  "  .215) 

ik  )  .02  otherwise  . 

The  signal  term  consists  of  a  nearby,  similar 

1  (f.  “  .165) 

k 

0  otherwise  . 

The  original  spectrum  is  shown  in  Fig.  4,  the 

tr  3  0  1  2 

Rr  -  6.0000  1.4544  -2.7732 


1.  J.  E.  Shore  and  R.  W.  Johnson,  December,  1978,  "Axiomatic  Derivation  of 
the  Principle  of  Maximum  Entropy  and  thp  Principle  of  Minimum 
Gross-Entropy,"  NRL  Memorandum  Report  3898. 


14 


2.  J.  E.  Shore  and  R.  W.  Johnson,  "Axiomatic  Derivation  of  the  Principle 
of  Maximum  Entropy  and  the  Principle  of  Minimum  Cross-Entropy IEEE 
Trans.  Inform.  Theory,  to  be  published. 

3.  J.  E.  Shore,  January,  1979,  "Minimum  Cross-Entropy  Spectral  Analysis 
NRL  Memorandum  Report  3921. 

4.  J.  P.  Burg,  1967,  "Maximum  Entropy  Spectral  Analysis , "  presented  at  the 
37th  Annual  Meeting  Soc.  of  Exploration  Geophysicists,  Oklahoma  City, 

Ok  la. 

5.  J.  Burg,  1975,  "Maximum  Entropy  Spectral  Analysis , "  Ph.D.  Dissertation, 
Stanford  University  (University  Microfilms  No.  75-25,499). 

6.  R.  W.  Johnson.  1979,  "Determining  Probability  Distributions  by  Maximum 
Entropy  and  Minimum  Cross-Entropy,"  APL  Quote  Quad  9,  No.  4  TAPL  79 
conf.  proceedings) ,  pp.  24-29. 

7.  S.  Kullback,  1959,  Information  Theory  and  Statistics ,  Wiley,  New  York. 

8.  I.  J.  Good,  1963,  "Maximum  Entropy  for  Hypothesis  Formulation, 

Especially  for  Multidimensional  Contingency  Tables , 11  Annals  Math.  Stat. 
34,  pp.  911-934. 

9.  R.  Johnson,  "Axiomatic  Characterization  of  the  Directed  Divergences  and 
Their  Linear  Combinations,"  IEEE  Trans.  Inform.  Theory,  to  be  published. 

10.  A.  Hobson  and  B.  Cheng,  1973,  "A  Comparison  of  the  Shannon  and  Kullback 
Information  Measures , "  J.  Stat.  Phys.  ]_,  No.  4,  pp.  301-310. 

11.  A.  Hobson,  1969,  "A  New  Theorem  of  Information  Theory, "  J.  Stat.  Phys. 

_1,  No.  3,  pp.  383-391, 

12.  E.  T.  Jaynes,  1957,  "Information  Theory  and  Statistical  Mechanics  jC," 
Phys.  Rev.  106,  pp.  620-630. 

13.  '2.  T.  Jaynes,  "Prior  Probabilities , "  IEEE  Trans.  Systems  Science  and 
Cybernetics  SSC-4,  1968,  pp.  227-241. 

14.  A.  van  den  Bos,  "Alternative  Interpretation  of  Maximum  Entropy  Spectral 
Analysis ,"  IEEE  Trans.  Inform.  Theory  IT-1 7 ,  1971,  pp.  493-4. 


15 


Figures 


Frequency 

FIGURE  1.  Original  Spectrum  for  First  Example 


oid  h#  h  it  n  9D 


Frequency 

FIGURE  2.  Maximum-Entropy  Spectrum  for  First  Example 


iimmmiuiimiuininimnmi 


mini 


Frequency 


FIGURE  3.  Minimum-Cross-Entropy  Spectrum  for  First  Example 


0 


2 


S 

P 

e 

c 

t 

r 

a 

1  1 

P 

o 

w 

e 

r 


Frequency 

FIGURE  5.  Maximum-Entropy  Spectrum  for  Second  Example 


20 


O  •o  !— ■tO'-lrrnft'O 


1 


1X4  t  it,  JLlll 


0  ‘till  114.  lil+ll 

0  .1 


. .  >.  l  1  Ij  I  -111^4^^^j^^ 


^♦.*.111111x14111111111, 
.3  .4  .5 


Frequency 


FIGURE  6.  Minimum-Cross -Entropy  Spectrum  for  Second  Example 


21 


OPTIMAL  ESTIMATION  FOR  BAND  L I M I  TED , 
TIME-CONCENTRATED  SIGNALS 


D.P,  KOLBA  AND  T.W.  PARKS 


Department  of  Electrical  Engineering 
Rice  Un i vers  I  ty 
Houston,  Texas 


Abstract 

A  new  estimation  procedure  for  band  limited,  time-concentrated  signals 
Is  described.  The  method  optimally  estimates  desired  measurements  of  an 
unknown  signal  by  minimizing  a  maximum  error.  The  information  available  for 
the  estimation  consists  of  a  limited  number  of  time  samples  and  knowledge 
about  tiie  bandwidth  end  approximate  time  duration  of  the  signal.  An 
example  will  compare  the  new  method  with  existing  band  limited  estimation 
methods.  The  requirements  on  sampling  rate  and  the  effects  of  errors  (noise) 
In  the  data  are  discussed.  A  recursive  implementation  of  the  estimation  is 
preson  ted. 

Introduct ion 

In  the  calculation  of  the  spectrum  of  a  discrete  time  signal, 

X(f)  -  r,  x(n)  e“J2rr1n  ,  (1) 

n— uu 

problems  arise  when  only  a  finite  portion  of  the  signal  is  available.  If 
no  additional  information  is  available,  the  spectral  estimate  Is  computed 
from  a  windowed  version  of  the  data  [I],  If  additional  information  about 
x(n)  is  known,  such  as  knowing  x(n)  is  bandlimitod  or  band! Imi ted  and  time- 
concentrated,  then  this  additional  knowledge  should  be  used  to  give  a  better 
spectral  estimate.  The  additional  information  may  be  incorporated  Into  a 
now  direct  estimate  for  the  spectrum.  Alternatively,  the  additional  infor¬ 
mation  can  be  used  to  extrapolate  the  given  segment  of  data.  The  transform 
of  this  extrapolated  signal  is  then  the  desired  estimate  of  the  spectrum. 

Extrapolation  and  spectral  estimation  for  bane] lint  ted  signals  have  been 
studied  recently  in  [2-5],  A  finite  data  segment,  x(-M)  ,x(-M-H ) . x(M) , 


This  work  was  supported  by  NSF  Grant  ENG70-09033. 


of  a  bandlimited  signal  is  known.  From  this  given  information,  the 
extrapolation  and  spectral  estimation  are  derived.  In  [2],  the  extrapolation 
is  shown  to  be  the  minimum  energy  extension  of  the  finite  data  segment  to 
a  bandlimited  signal.  This  method  will  have  the  extrapolation  as  concentrat¬ 
ed  on  the  measurements  as  possible.  Therefore,  this  method  will  be  called 
concentrated  on  measurements  (COM)  estimation.  The  methods  of  [4]  and  [5] 
are  equivalent  to  this  COM  estimation.  The  iterative  method  of  [3]  will 
converge  to  the  COM  estimate. 

The  COM  method  concentrates  the  estimation  on  the  measurement  Interval 
regardless  of  where  the  true  signal  is  concentrated.  If  the  signal  is  known 
In  advance  to  be  concentrated  on  an  Interval  larger  than  that  over  which  the 
measurements  are  taken,  then  this  additional  information  should  be  incorpor¬ 
ated  into  the  estimation  method.  In  this  paper,  a  new  estimation  method  Is 
described  which  incorporates  this  additional  Information  about  the  signal 
into  the  solution  [10].  Tills  new  method  will  be  called  concentrated  on  the 
signal  (COS)  estimation. 

The  new  COS  estimation  method  will  be  derived  using  deterministic 
estimation  theory.  A  short  presentation  of  this  theory  will  first  be  made. 
Then,  the  COS  estimation  method  will  be  derived.  Next,  the  new  COS  method 
will  be  compared  to  the  COM  method  on  a  practical  example.  A  new  sampling 
rate  criterion  for  COS  estimation  will  be  discussed.  The  sensitivity  of  the 
method  to  errors  in  the  data  will  be  determined.  Finally,  a  recursive  for¬ 
mulation  for  the  new  estimation  procedure  will  be  presented, 

Deterministic  Estimation 

In  a  deterministic  estimation  problem  [6,7],  some  measurement  of  the 
deterministic  signal  x  is  desired.  This  desired  measurement  could  be  a 
frequency  sample,  a  time  sample,  a  derivative  sample,  or  any  other  measure¬ 
ment  which  Is  a  continuous  linear  functional  of  the  signal  x.  The  Informa¬ 
tion  known  about  x  Is  incomplete;  therefore,  the  desired  measurement  must 
be  estimated  from  the  limited  knowledge  about  x.  The  estimate  Is  made 
optimal  by  minimizing  the  maximum  magnitude  of  the  error. 

The  limited  information  about  x  consists  of  knowing  x  is  an  element 
in  a  signal  space,  knowing  certain  measurements  on  x,  and  having  a  bound  on 
the  size  of  x.  In  more  detail,  the  signal  x  is  known  to  be  a  member  of  a 
Hilbert  space  IL  .  The  inner  product  in  this  space  is  denoted  (•,•)«. 

The  known  measurements  on  x  are  the  N  continuous  linear  functionals 


<*•  uk\  ■  ak  ■ 


N  . 


(2) 


(3 


The  bound  on  the  size  of  x  is  given  by 

2 


(x.*>q  5  € 


Th'is  given  information  defines  the  set  R  of  possible  signals, 


R  =  :  (x.u^  -  ak>  k  =  1,2 . N  and  (x.x)^  <  €  •)  (4, 

The  desired  measurement  on  x  is  also  a  continuous  linear  functional 


on  x ; 


<x‘uo\  ‘  a0 


(5: 


A  value  for  a  must  be  estimated  from  the  knowledge  that  x  £  R  .  The 
estimate,  o’  ,°is  selected  to  minimize  max| Sq-Sq I .  The  error  in  this 
estimation  is  bounded  by 


E(R) 


m i n  max  | 
a0  x£R 


a 


(6) 


In  [6,7],  the  solution  to  this  estimation  problem  is  derived  using 
the  geometry  of  H  .  Since  the  bound  on  the  size  of  x,  (3),  is  a  quadratic 

form,  the  optimal^  estimate  is  a  linear  combination  of  the  data,  (2).  In 

fact,  the  optimal  reconstruction  of  the  entire  signal,  >?,  is  a  linear  com¬ 
bination  of  the  data.  From  x,  any  desired  measurement  on  x  is  estimated 

by 


('X’  U0^Q. 


(7] 


The  optimal  reconstruction  of  x  is  the  unique  signal  which  Is  in 
R  and  also  in  the  subspace  spanned  by  the  linearly  independent  measurement 
signals,  u,  ,  k  =  1,2,,..  N  [6,7],  Thus,  x  is  a  linear  combination  of  the 

i  „  K 

u  's: 

K  N 

*  =  V,  b.  u,  .  (8 

k*  1  K  K 

Since  x  £  R,  x  must  satisfy  (2): 

(x,  Uj)q  “  a^  ,  l  B  1 ,  2 . N.  (S 


Using  ( 8)  in  (9)  leads  to 


Y  b^CU^,  *  a^,  /,  »  1,  2 


i  tt>t  N • 


(10) 


k=! 


The  coefficients  for  the  expansion  in  (8)  can  now  be  solved  for  from  (10). 
Using  matrix  notation,  x  Is 


=  L 


■ 

-1  ■ 

<V  Vtt 

uk 

k,  l  -  1 ,  2 . N  (11) 


where  u.l  is  a  column  of  the  measurement  signals.  Now,  replacing  x  In  (7) 
by  (11)  'gives 


<v  Va 


-l 


(U,,. 


k,  -  1  ,  2 . N.  (1  2) 


The  optimal  estimate  may  be  calculated  directly  from  the  data,  (12),  or 
may  be  calculated  by  taking  the  desired  measurements  on  the  reconstructed 
signal  using  (7)  and  (11). 


The  error  bound  of  (6)  Is  evaluated  In  [7]  as 


(U.,  Uo)0 

E(R>  -  i  (U„  U0)n  -  ^ 


(  2  ai 

-1 

1 

U  -  l -i-i 

<V  Vd 

* 

\ 

V. 

- 

J 

<V  Vo 

1/2 


-1 


O  1/2 


(u. ,  a.) 


(13) 


k,  1  ,  2 . N. 


In  (13),  the  first  term  specifies  the  error  in  approximating  uQ  by  a  linear 
combination  of  the  u^'s.  The  second  term  In  (13)  measures  how  large  the  set 
R  of  possible  signals  Is  and  depends  on  the  data. 


A  property  of  the  optimal  reconstruction  shown  In  [7]  is 

<*'  "  S&  (x'  X)Q  . 


04) 


This  property  will  be  used  to  formulate  a  new  estimation  procedure  In  which 
the  estimated  signal  is  concentrated  on  a  known  signal  interval. 


26 


ag-SiKKi. 


'T-Tin  ijjtrTTCTifgSZa:'." 


I 


Estimation  Concentrated  On  The  Signal 

Consider  the  problem  of  reconstructing  a  bandlimited  signal  from  a 
finite  data  segment.  In  [2],  the  best  reconstruction  of  x  with  this  given 
information  has  minimum  energy.  Since  the  energy  of  the  signal  on  the 
samples,  n=-M,  -M+1 ,  ....  M  is  fixed,  this  reconstruction  also  has  minimum 
energy  tails  outside  the  measurement  interval.  The  reconstruction  proce¬ 
dure  (COM)  is  concentrated  on  the  measurement  interval.  If  the  signal  is 
known  to  be  of  longer  duration  than  from  -M  to  M,  this  information  should 
be  used  to  extend  the  concentration  beyond  the  measurement  interval  to  the 
larger  signal  interval.  The  new  estimation  method  developed  in  this  paper 
(COS)  is  concentrated  on  the  signal  interval. 


Let  B  denote  the  bandlimlting  operator  so  that  for  finite  energy  x(n), 
y-Bx  Implies  (X(f)  p.s|fjsp. 

Y(f)  -  1  '  1 

0  otherwise 


for  0  s  p.  <  p,  <  .5. 
x(n)  as  given  By  (1). 
y=Dx  impl i es 

y(n)  *= 


Here,  Y(f)  and  X(f)  denote  the  transforms  of  y(n)  and 
Next,  let  D  denote  the  timelimiting  operator  so  that 

x(n)  jnj  s:  L 

0  otherwise. 


The  operator  (I  -  D)  will  retain  the  tails  of  the  signal  it  operates  on: 


(  i  -  D)  x= 


0  Jnj  *  L 

x(n)  jnj  >  L  . 


For  the  COS  estimation,  the  property  of  concentrating  the  recon¬ 
struction  of  x  on  the  signal  Interval  can  also  be  viewed  as  minimizing 
the  tails  outside  of  the  signal  Interval.  Recalling  (14),  leads  to  the 
selection  of  an  inner  product  which  deals  with  the  tails  outside  of  the 
signal  interval,  n  ■  -L,  -L+l ,  ....  L.  This  inner  product  will  be  denoted 
(*,  *)|_g  and  is  defined  by 


it 

(x,  y)  .  =  E  x(n)y  (n) 

'-D  |n|>L 


(15) 


Using  this  inner  product  in  the  deterministic  estimation  will  result  in 
selecting  as  the  reconstruction  that  signal  which  fits  the  data  and  has 
minimum  energy  tails  outside  of  the  signal  interval.  If  L=M,  COM  and  COS 
are  Identical.  When  L>M  the  methods  are  different. 


i 

i 

1 

i 

] 

1 


27 


The  space  of  bandlimited  signals  concentrated  on  [-L,  -L+l . L] 

as  discussed  in  [2]  is  given  by 


=  |  x  :  x 


BDy,  y  has  finite  energy 


A  deterministic  estimation  problem  can  now  be  formulated  which  will  give 
the  desired  COS  estimation  method.  The  Hilbert  space  H  from  (16)  with 
inner  product  (*,  *),_  given  by  (15)  will  be  the  known  signal  space. 

This  implies  that  the  Bandwidth  and  time  duration  are  assumed  known. 
Signals  to  be2estimated  will  have  bounded  energy  in  the  tails  : 

(x,  x)  ,  ^  s;  £  .  The  known  measurements  on  x  will  be  the  N  time  samples 
a.  =  x(np  k  =  1,  2,  ...,  N  .  This  Information  specifies  the  unknown 
signal  as  an  element  of  the  set  „ 

R  =  (x<=H:  (x,x)(_0  £  C  and  x(n,)  =  a^,  k  «•  1,  2 . N). 

The^est imat ion  problem  for  any  desired  measurement  can  now  be  solved 
if  x  of  (11)  can  be  found.  To  this  end,  the  measurement  signals  of  (2) 
with  respect  to  (*,  *) (  must  be  found. 

From  [2],  a  complete,  orthonormal  basis  for  H  is  the  set  of  signals 

vj  (n)  ■  X  j  Vj(n+L)  1=0,1 . 2L,  ( 

where  and  v.(n+L)  are  the  eigenvalues  and  discrete  prolate  spheroidal 
sequences  of  [2]  extended  to  the  bandpass  case.  For  any  x£H, 


x  =  E  c,y,. 
1=0  1  1 


where 


Cj  -  E  x(n)  y j (n) . 
n=  ”oo 

These  basis  signals  are  also  orthogonal  on  the  signal  Interval 


T.  y  j  ( n )  vj  (n)  =  \jdjj  <  =  0,1, ...,2L. 

Using  the  orthonormality  of  the  y|s  and  (20)  leads  to 
C  y  i »  y  j  )  |  _  D  =  ( 1  “X  j )  9  j  j  <  !>j  =  C ,  1  , . . . ,  2L. 


28 


Now,  define 


2L  1 

K(n’m)  =j=0  N\j  VjCn)  Vj(m) 

For  any  x^H,  (22)  gives 

(xOn),  K(n,m)) .  .  -  *  r-f  Yj<">  <*<“>•  Yj<”» 

J-0  j 

Substituting  (18)  in  (23)  yields 

2L  1  2L 

(x,K(n,  *))  j_D  ^  Yj<n>  c|  <V  VJ ^  i - D 

Applying  (2!)  Lo  (24)  and  simplifying  gives 

2L 

(x,  K(n,  *))._„  -  E  Cj  Yj(n)  -  x(n) 
j  =0  -  ■’ 

From  (25),  the  measurement  functions  sought  are  seen  to  be 
uk  -  K(V  •)  -  Eo  ^  Y,(nk)  Y|  , 
since  the  data  can  then  be  written  as 

x<"k>- <*■  Vi-d'V 


(22) 


(23) 


(24) 


(25) 


(26) 


Now  that  the  measurement  signals  have  been  found,  the  optimal  recon¬ 
struction  of  x  can  be  calculated  using  (11).  The  matrix  to  be  inverted 
has  elements  (u^.u^)  |__Q.  Since  u^£H, 

2L  1 

Ul)  I -  D  =  Uk(rV  =  j50  1-\j  Yi(nk)  Yl(n.{,)  .  ( 

Using  (27)  and  (26)  in  (11)  gives  x* 

Now  that  the  optimal  reconstruction  of  the  unknown  signal  has  been 
found,  any  desired  measurement  can  be  estimated  using  (7).  Two  common 


29 


-y nr- -V 


JBaa£^asi£g  """- 


■  gg=2-BB33Bi 


estimation  problems  are  extrapol  at  ion_and  spectral  estimation.  For 
extrapolation,  x£nQ)  is  estimated  as  x(nQ).  For  spectral  est imat Ion, X(fQ) 
is  estimated  as  X(¥q).  If  error  bounds  from  (13)  are  desired  for  these 
estimates,  then  the  signal  u^  of  (5)  needs  to  be  found.  For  extrapolation, 

u0(n)  =  K(n0, n)  .  (28) 

The  evaluation  of  (13)  requires 

(u«,u0)  (_D  -  K(n0,n0) 

and 


(uk,u0)  |-D  -  K(nR,n0)  . 


For  spectral  estimation, 


2L 


1 


where 


u0(n)  =  Z  yT~  Y»*<f  o)  V  t(n) 
1-0  1  K\  1  1 


Y,(f.)  "  2  V,(">  e"J2nf°n  . 

n“-« 


(29) 


The  evaluation  of  (13)  now  requires 

(Uo.Uo)  (-D  ■ 


2L 

Z 

[-0 


1  IVf.>i2 


l-x, 1  1 


and 


‘V^l-D 


21 

-  Z 

1*0 


1-Xj  Vfo)  V 


With  this  COS  solution  to  the  estimation  problem,  the  new  method  can 
be  compared  to  the  COM  method  on  a  practical  example. 


Comparison  Of  Methods 

An  application  in  which  knowledge  of  signal  duration  Is  valuable 
consists  of  a  problem  in  which  signals  of  known  duration  have  been  over- 


30 


lapped.  Consider  the  signal 


(  2000  t  e~400t  sin  2n400t  t  *  0 
x.(t)  =  <  (30) 

'  0  t  <  0 

This  signal  has  a  time  duration  of  about  12.5  msec,  and  is  approximately 
bandlimited  with  a  bandwidth  of  500  Hz.  Now,  several  of  these  signals  are 
overlapped  and  added: 

x ( t)  =  l/2x1  (t)  +x](t-.009)  4-  2x  (t-.0177)  (31) 

The  components  of  this  signal  are  shown  in  Fig.  3.  From  this  composite 
signal,  information  about  the  middle  component  is  desired.  Since  the 
duration  of  the  first  component  is  12.5  msec,  and  the  third  component 
starts  at  t= 17.7  msec.,  good  measurements  for  the  second  component  are 
available  in  the  interval  12.5<t^l7.7  .  With  a  sample  spacing  of  0.5  msec., 
11  measurements  are  made  on  the  second  component. 

In  the  discrete  time  formulation,  the  second  component  Is  considered 
as  a  discrete  time  signal  centered  about  n=0  and  concentrated  on  n=- 1 2,-11 , 
*•*,12  (corresponding  to  12.5  msec,  sampled  every  0.5  msec.).  The  11 
measurements  are  therefore  the  samples  at  n=-5 , -4 , * • • , 5.  The  assumed 
bandwidth  for  the  signal  is  p,**.05  Hz  and  Hz  using  normalized 

frequency. 

Extrapolation  and  spectral  estimation  for  the  COS  and  COM  methods  can 
now  be  compared.  The  extrapolations  for  both  methods  are  shown  in  Fig.  2. 
The  actual  desired  signal  is  the  dotted  curve.  Fig.  3  shows  the  spectral 
estimates  obtained  using  the  two  methods.  As  can  be  seen,  the  COS  method 
gives  a  better  reconstruction  of  the  signal  and  a  correspondingly  more 
accurate  spectrum  than  the  COM  method. 


Sampling  Rate  For  COS  Estimation 

The  COS  estimation  method  deals  with  bandlimited,  time-concentrated 
signals.  The  class  of  signals  with  bandwidth  p  and  duration  T  has  approxi¬ 
mately  2  p  T  dimensions  [2j.  Therefore,  approximately  2  p  T  measurements 
should  give  a  good  estimate  of  a  signal  in  this  class.  This  Is  verified  by 
the  error  bound  of  (13).  The  first  term  in  (13)  was  calculated  for  the  COS 
spectral  estimation  problem  of  Fig.  3.  This  term  of  the  error  bound  was 
evaluated  at  9  frequencies  between  fQ=.05  and  fQ=.3.  The  resulting  RMS 


31 


value  Is  plotted  versus  the  number  of  samples  used  for  the  estimation  In 
Fig.  4  .  The  error  bound  has  reached  a  small  value  near  2  g  T  measurements. 

The  need  for  about  2  3  T  measurements  determines  the  sampling  rate  at 
which  the  measurements  are  taken,  if  Tj.  is  the  duration  of  the  measurement 
interval  and  T^  is  the  duration  of  the  signal,  then  the  2  g  T^  samples  must 
be  taken  in  the  measurement  interval.  This  requires  a  sampling  period  of 

M  7 

Ts  -  2?  (32> 

From  (32),  if  the  measurement  interval  decreases  with  respect  to  the  signal 
interval ,  the  sampling  rate  must  increase  to  retain  2  $  T^  samples.  This 
increased  sampling  rate  will  cause  an  increase  in  the  sensitivity  of  the 
estimates  to  errors  in  the  data  measurements. 


Sens  it  ?v i ty 

The  increase  in  sensitivity  as  a  result  of  increased  sampling  rate 
affects  both  estimation  methods.  These  methods  are  linear;  and  so,  the 
effects  of  errors  on  the  data  can  be  treated  separately.  Let  e  be  an  error 
signal  which  is  added  to  the  true  signal  x.  Let  A  be  the  linear  estimation 
operator  which  takes  the  measurements  on  the  signal  and  generates  the 
desired  estimate.  Then, 


A(x+e)  -  Ax  +  Aqj  (33) 

The  term  Ae  is  the  additional  error  in  the  estimation  due  to  the  measure¬ 
ment  errors.  This  can  be  bounded  by  *  f 

IN!  £  l!All  *llelf  w 

The  bound  on  the  sens i t iv i ty  of  the  estimation  to  errors  Is  directly  related 
to  ||A||.  An  approximation  to  ||A||  for  both  COM  and  COS  is  given  by 

||Aj|  «  0.6[5(l-2cc)]N  for  .1  sees  .35,  (35) 

where  the  parameter  «  is  the  normalized  bandwidth  of  the  discrete  time 
signal  and  N  is  the  number  of  measurements  usedym  From  (32),  the  value  of 
oc  can  be  calculated  for  the  lowpass  case  as  ““rfT  ♦  If  N  Is  selected  as 
2  p  T^  and  T^  is  held  fixed,  then  the  sensitivity  increases  as  T^  decreases. 

Currently,  estimation  with  smoothing  (see[8])  Is  being  investigated  as 
a  means  of  reducing  sensitivity  to  noise. 


32 


33 


(41) 


a 


k 


Thus,  the  c . 1 s  required  in  (37)  are  calculated  from  the  data  using  the 
coefficients  given  in  (39)  from  (36). 

For  any  desired  measurement,  the  estimate  is  found  by  substituting 
(37)  in  (7)  : 


a0  =  £  c,  (w„ ,  u0)  . 


4*  1 


l 


This  estimate  can  be  calculated  recursively  using 


N-i 

a°  ct(wrUo)a  +  cN(wN’u-)a  • 


(42) 


(43) 


The  first  term  in  (43)  is  the  estimate  given  the  first  N-l  measurements.  The 
second  term  updates  the  estimate  when  the  Nth  data  measurement  Is  included. 


Conclusions 


The  application  of  deterministic  estimation  theory  to  signal  processing 
problems  can  lead  to  interesting  new  estimation  methods.  The  new  COS  esti¬ 
mation  method  described  in  this  paper  has  been  seen  to  give  improved  esti¬ 
mates  compared  to  the  COM  method  when  the  signal  concentration  Interval  is 
larger  than  the  measurement  interval.  The  COS  method  concentrates  the 
estimation  on  the  signal  interval  as  opposed  to  the  COM  method  which  concen¬ 
trates  the  estimation  on  the  measurement  interval.  Consideration  of  the 
error  bounds  for  the  new  method  leads  to  a  new  criterion  for  the  sampling 
rate  to  be  used.  The  sensitivity  of  the  estimation  to  errors  in  the  data 
has  also  been  presented.  Finally,  a  recursive  implementation  of  the  new 
method  has  been  described. 


References 


1.  R.B.  Blackman  and  J.W.  Tukey,  “The  Measurement  of  Power  Spectra,"  New 
York:  Dover,  1958.  . 


34 


2.  D.  Slepian,  "Prolate  Spheroidal  Wave  Functions,  Fourier  Analysis,  and 
Uncertainty  -  V:  The  Discrete  Case,"  BSTJ  Vol.  57,  No.  5,  Hay  -  June 
1978,  PP  1371  -  1430. 

3.  A.  Papoulis,  "A  New  Algorithm  in  Spectral  Analysis  and  Bandlimited 
Extrapolation,  "IEEE  Trans,  on  Circuits  and  Systems,  Vol.  CAS-22,  No.  9, 
pp  735-742,  Sept.  1975. 

4.  J.A.  Cadzow.,  "An  Extrapolation  Procedure  for  Band-Limited  Signals, 

"IEEE  Trans,  on  ASSP,  Vol.  ASSP-27,  No.  1,  pp.  4-12,  Feb.  1979. 

5.  D.P.  Kolba  and  T.W.  Parks,  "Extrapolation  and  Spectral  Estimation  for 
Bandlimited  Signals,"  1978  I  CAS  S  P  Record,  pp.  372-374. 

6.  M.  Golomb,  Lectures  on  the  Theory  of  Approximation.  Argonne  National 
Laboratory,  1962. 

7.  M.  Golomb  and  H.  Weinberger,  "Optimal  Approximation  and  Error  Bounds," 

On  Numerical  Approximation,  ed.  R.  Langer,  Univ.  of  Wisconsin  Press, 
Madison,  pp.  117-190,  1959. 

8.  H.L,  Weinert,  "A  Reproducing  Kernal  Hilbert  Space  Approach  to  Spline 
Problems  with  Applicaitons  in  Estimation  and  Control,"  Information 
Systems  Lab.,  Stanford  Univ.,  Technical  Report  No.  7001-6,  May  1972. 

9.  L.E.  Franks,  Signal  Theory,  Prentice  Hall  Inc.,  Englewood  Cliffs,  NJ 
1969. 

10.  D.P.  Kolba  and  T.W.  Parks,  "Extrapolation  and  Spectral  Estimation  for 
Bandlimited,  Time-Concentrated  Signals,"  1979  ICASSP  Record,  pp.  190-193. 


time-concentrated  signal®. 


I  1 


extrapolation  estimates,  (a)  C( 
with  M-5  (b)  COS  estimate  with  M-5  and  L-12 
The  actual  signal  is  the  dashed  curve. 


L' 


36 


"•  vretp  r:-. .»■ 


r,fi  v- 


Figure  3«  Comparison  of  spectral  estimates  with  assumed  bandwidth  p  05 
and  •  (a)  COM  estimate  (b)  COS  estimate 

The  aatual  spectrum  Is  the  dashed  curve, 

1000 


Maximum 


Figure  4,  Maximum  error  bound  versus  number  of  samples  for  spectral 
estimation. 


..a-,  hi# 


THE  USE  OF  LINEAR  PREDICTION  FOR  THE  INTERPOLATION 
AND  EXTRAPOLATION  OF  MISSING  DATA  AND  DATA  GAPS 
PRIOR  TO  SPECTRAL  ANALYSIS 

STEPHEN  B.  BOWLING 
SHU  LAI 


Massachusetts  Institute  of  Technology 
Lincoln  Laboratory 
Lexington,  Massachusetts  02173 


Abstract 

The  spectral  analysis  of  a  series  of  equally  spaced  samples  of  a  coherent 
time-stationary  process  becomes  difficult  when  samples  are  missing  or  sizable 
data  gaps  occur  within  the  interval  of  interest.  A  linear  prediction  algo¬ 
rithm  can  be  used  to  fill  in  the  missing  data  with  estimates  that  are  spec¬ 
trally  consistent  with  the  data  that  are  observed.  Simulated  and  practical 
radar  examples  demonstrate  an  improvement  in  resolution  and  a  reduction  of 
sidelobe  interference  levels. 

Problem  Definition 

When  a  spectral  transformation  of  a  sampled  process  is  performed,  one 
must  account  for  any  samples  that  are  missing.  Assigning  a  value  of  zero  to 
missing  data  prior  to  Fourier  transformation,  for  example,  introduces  false 
frequencies  and  greatly  increases  sidelobe  levels.  Clearly,  an  interpolation 
scheme  is  needed  that  can  cope  with  missing  data  and,  at  the  same  time,  will 
not  degrade  the  spectral  information  contained  in  the  data  that  are  observed. 

Occasional  missing  samples,  well  separated  from  each  other,  can  be 
estimated  with  simple  interpolation  procedures  (polynomial  or  parabolic  fits, 
spline  fits,  etc.).  However,  data  may  be  missing  in  such  quantity  that  con¬ 
ventional  interpolation  is  inadequate;  data  gaps  longer  than  the  periods  of 
the  sinusoidal  components  in  the  data  cannot  be  easily  bridged  with  simple 
functions.  A  more  sophisticated  approach  becomes  necessary,  and  the  use  of  a 
data-adaptive  linear  prediction  filter  is  one  feasible  alternative. 

In  radar  data  processing,  missing  data  or  data  gaps  may  occur  for  a 
variety  of  reasons: 


(a)  hardware  fails  to  transmit  pulses  or  receive  echoes  properly; 

(b)  radar  transmits  when  it  should  be  receiving  echoes  (range 
eclipsing) ; 

(c)  resources  are  saturated  by  many  targets  that  must  be  watched 
simultaneously  (panic) ; 

(d)  burst  waveforms  are  purposely  silent  between  bursts; 

(e)  poor  signal- to-noise  makes  detections  sporadically  unreliable. 

In  any  case,  the  missing  samples  (in  these  examples,  complex  samples  with 
amplitude  and  phase)  must  be  filled  in  before  Doppler  processing  can  be 
accomplished. 


Description  of  the  Method 

The  use  of  a  linear  prediction  filter  to  extend  a  finite  complex  data  set 
before  Fourier  transformation  was  first  proposed  and  described  by  Bowling 
(1977) .  Applying  this  original  algorithm,  Tomlinson  and  Ackerson  (1978) 
demonstrated  clutter  and  sidelobe  reduction  in  the  Doppler  processing  of  a 
train  of  radar  pulses. 

In  the  application  of  interest  here,  the  prediction  algor ithm  is  used  to 
predict  estimates  of  missing  data  by  extrapolating  from'  observed  data.  For 
example,  suppose  an  observation  interval  contains  randomly  missing  samples  and 
gaps.  The  procedure  is  as  follows: 

(1)  Locate  and  designate  the  missing  samples  to  be,  estimated. 

(2)  Find  the  longest  continuous  span  of  data  within  which 
there  are  no  missing  samples. 

(3)  Calculate  an  N-point  linear  prediction  filter  from  the 
longest  continuous  span  of  data  found  in  step  (2). 

(4)  Calculate  an  estimate  of  each  missing  sample  immediately 
to  the  left  and  to  the  right  of  the  longest  continuous 
span  of  data  (a  total  of  two  estimates,  one  on  each  side) . 

(5)  Return  to  step  (2)  until  all  missing  data  have  been  esti¬ 
mated.  (Note  that  estimates  from  step  (4)  are  to  be  treated 
as  observations  on  an  equal  basis  with  the  original  data. 

That  is,  the  "longest  continuous  span  of  data"  is  increasing 
in  length  as  estimates  fill  in  the  holes,  one  by  one,  to 
the  left  and  to  the  right.) 


When  the  longest  continuous  span  of  data  finally  terminates  at  one  of  the 
endpoints  of  the  observation  interval,  estimates  continue  to  be  made  toward 
the  other  endpoint  until  all  missing  points  have  been  filled  in.  The  length 
of  the  prediction  filter  may  remain  a  constant,  or  vary  according  to  the 
current  length  of  the  longest  span  of  data. 

Simulated  Examples 

A  simple  example  shows  the  improvement  in  the  power  spectrum  of  a  data 
set  containing  missing  samples  and  gaps. 

The  real  and  complex  parts  of  a  sampled  sum  of  sinusoids  are  shown  in 
Figs.  l(a,b).  No  noise  has  been  added  and  no  samples  are  missing.  Figure  1(c) 
is  the  true  power  spectrum  calculated  with  a  standard  FFT  with  no  weighting. 

Now  if  samples  are  randomly  zeroed  out  and  data  gaps  are  introduced  as 
shown  in  Figs.  2(a,b),  the  power  spectrum  in  Fig.  2(c)  shows  increased  side- 
lobe  levels  and  false  frequencies,  both  caused  by  processing  without  estima¬ 
ting  the  missing  data. 

Figures  3(a,b)  show  the  data  set  after  the  linear  prediction  algorithm  is 
applied,  with  the  power  spectrum  shown  in  Fig.  3(c).  Not  only  do  Figs.  l(a,b) 
overlay  with  3(a,b)  almost  exactly,  but  their  respective  power  spectra  are 
indistinguishable . 

Another  simple  example  demonstrates  the  performance  of  the  linear  pre¬ 
diction  algorithm  when  data  gaps  occur  periodically,  such  as  is  the  case  for 
a  radar  burst  waveform. 

Figures  4(a,b)  represent  the  process  of  Figs.  l(a,b)  for  which  three  data 
gaps  are  present.  Indeed,  half  of  the  data  are  missing  from  the  observation 
interval,  and  the  gaps  are  longer  than  any  period  exhibited  in  the  data.  The 
power  spectrum  of  Fig.  4(c)  is  a  very  poor  estimate  of  the  true  spectrum 
(Fig.  1(c))  because  no  gaps  have  been  filled  in.  Transforming  only  one  of 
the  short  spans  of  observed  data  gives  a  power  spectrum  with  limited  reso¬ 
lution,  as  shown  in  Fig.  4(d). 

However,  upon  using  the  prediction  algorithm  on  Figs.  4(a,b),  we  obtain 
Figs.  5(a,b)  and  the  power  spectrum  in  Fig.  5(c),  which  is  a  good  estimate 
of  the  true  spectrum. 

In  this  case,  the  prediction  algorithm  has  acted  as  a  synergistic  device 
that,  by  linking  short  pieces  of  data  together  with,  spectrally  consistent 
estimates,  allows  a  spectral  transform  to  be  performed  over  an  effectively 
longer  piece  of  data.  The  whole,  then,  has  more  resolving  power  than  any  of 
its  parts. 


41 


It  should  be  pointed  out  that  the  data  gaps  need  not  be  periodic  or 
equal  in  length  in  order  for  the  prediction  algorithm  to  fill  them  in. 

Radar  Example 

Radar  is  often  used  to  identify  targets  from  the  time  history  of  the 
velocity  spectrum  of  the  target's  motion  about  its  center  of  mass.  A  series 
or  burst  of  radar  pulses  is  Fourier  analyzed,  and  the  target's  velocity  spec¬ 
trum  is  observed.  If  not  accounted  for  in  the  processing,  missing  pulses  can 
introduce  false  velocity  components  and  lead  to  an  incorrect  characterization 
or  classification  of  the  target. 

For  example,  Fig.  6(a)  shows  the  evolution  of  the  velocity  spectrum  of 
a  tumbling  object  for  which  missing  data  and  data  gaps  exist  and  are  set  to 
zero  in  the  radar  pulse  train.  No  estimation  for  the  missing  pulses  has  been 
done.  It  is  therefore  not  clear  if  the  velocities  indicated  are  actually 
from  the  target  or  are  an  artifact  of  the  missing  data.  Figure  6(b)  shows  the 
evolution  of  the  same  velocity  spectrum  upon  using  the  prediction  algorithm 
before  Fourier  transformation.  The  disappearance  of  some  of  the  velocities 
cleans  up  the  spectral  history  and  indicates  which  velocity  components  actually 
characterize  the  target. 


Limitations  of  the  Method 


Implicit  in  the  use  of  a  linear  prediction  filter  is  the  assumption  that 
the  data  from  which  the  filter  is  derived  are  time-stationary.  The  process 
being  sampled  must  be  coherent  during  the  observation  interval  which  is  being 
analyzed  and  within  which  the  missing  data  and  data  gaps  may  occur. 

Also,  the  prediction  filter  works  best  when  the  spectral  components  are 
approximately  pure  tones,  confined  to  locally  narrow  bandwidths  spaced  within 
the  Nyquist  bounds  of  the  spectral  transform  domain. 

Summary 

This  paper  proposes  the  use  of  a  linear  prediction  algorithm  to  fill  in 
missing  data  and  data  gaps  that  may  occur  within  an  observation  interval  over 
which  a  spectral  transform  is  to  be  made.  False  frequencies  and  sidelobe 
interference,  which  are  artifacts  of  the  missing  samples,  can  be  suppressed 
or  eliminated  by  replacing  the  missing  samples  with  estimates  that  are  spec¬ 
trally  consistent  with  neighboring  observed  data.  Large  gaps  can  be  smoothly 
bridged  that  otherwise  could  not  be  satisfactorily  interpolated  by  simpler 
schemes. 

Computer  programs  which  accomplish  the  interpolation  and  extrapolation 
procedures  for  complex  data  can  be  found  in  Ref. [3] . 


42 


I 

* 

i 

7 

l 


References 

1.  Bowling,  S.  B.  ,  "Linear  Prediction  and  Maximum  Entropy  Spectral  Analysis 
for  Radar  Applications,"  Project  Report  RMP-122,  Lincoln  Laboratory, 
M.I.T.  (24  May  1977),  DDC-AD-A042817/7 . 

2.  Tomlinson,  P.  G.,  and  G.  A.  Ackerson,  "Air  Vehicle  Detection  Using 
Advanced  Spectral  Techniques,"  Proceedings  of  the  RADC  Spectrum  Esti¬ 
mation  Workshop,  Rome  Air  Development  Center,  Rome,  New  York  (May  1978). 

3.  Bowling,  S.  B.  and  S.  Lai,  "Use  of  Linear  Prediction  for  the  Interpola¬ 
tion  and  Extrapolation  of  Missing  Data  and  Data  Gaps,"  Report  TN-1979-46, 
Lincoln  Laboratory,  M.I.T.  (to  be  published). 


This  work  was  sponsored  by  Ballistic  Missile  Defense  Advanced  Technology 
Center,  Department  of  the  Army. 

The  views  and  conclusions  contained  in  this  document  are  those  of  the 
contractor  and  should  not  be  interpreted  as  necessarily  representing  the 
official  policies,  either  expressed  or  implied,  of  the  United  States 
Government. 


43 


FREQUENCY 

(a) 


FREQUENCY 

(b) 


FIG.  6  (A)  Doppler  history  of  tumbling  object  when  missing  data  and 
data  gaps  are  not  accounted  for. 

(B)  Doppler  history  after  missing  data  are  filled  in  with  the 
linear  prediction  algorithm. 


A  NEW  AUTOREGRESSIVE  SPECTRUM  ANALYSIS  ALGORITHM 


Larry  Marple 


Advent  Systems  ,  Inc. 
355  Ravendale  Drive 
Mountain  View,  CA  94043 


Abstract 


A  new  recursive  algorithm  for  autoregressive  (AR)  spectral  estimation  is 
introduced,  based  on  the  least  squares  solution  for  the  AR  parameters  using 
forward  and  backward  linear  prediction.  The  algorithm  has  computational  com¬ 
plexity  proportional  to  the  order  of  the  process  squared,  comparable  to  that  of 
the  popular  Burg  algorithm.  The  computational  efficiency  is  obtained  by  ex¬ 
ploiting  the  structure  of  the  least  squares  normal  matrix  equation,  which  may 
be  decomposed  into  products  of  Toeplitz  matrices.  AR  spectra  generated  by  the 
new  algorithm  have  improved  performance  over  AR  spectra  generated  by  the  Burg 
algorithm.  These  improvements  include  less  bias  in  the  frequency  estimate  of 
spectral  components,  reduced  variance  in  frequency  over  an  ensemble  of  spectra, 
and  absence  of  observed  spectral  line  splitting. 

Introduction 


Autoregressive  spectrum  analysis,  sometimes  termed  maximum  entropy  spec¬ 
trum  analysis  (MESA) ,  has  become  a  popular  alternative  to  the  periodogram  as  an 
estimate  of  the  power  spectral  density  (PSD)  for  a  sampled  process.  For  signal 
to  noise  ratios  (SNRs)  greater  than  0  dB,  the  AR  PSD  estimate  has  higher  fre¬ 
quency  resolution  than  that  of  the  conventional  periodogram  estimate  [1] .  AR 
spectral  estimates  also  do  not  have  the  distortion  produced  by  sidelobe  leakage 
effects  that  are  inherent  in  the  periodogram  approach  to  spectrum  analysis. 
These  are  two  of  several  attractive  features  of  AR  psectral  estimation  that 
have  created  interest  in  this  technique. 

The  means  used  to  estimate  the  autoregressive  model  parameters  is  the  key 
to  the  performance  of  the  AR  technique.  If  M+l  lags  of  the  autocorrelation 
function  for  a  process  are  known,  the  M  autoregressive  parameters  are  obtained 
by  solving  the  Yule-Walker  normal  equations  using  the  Levinson  recursion  algo¬ 
rithm  [2] .  2T^e  algorithm  requires  a  number  of  computational  operations  propor¬ 
tional  to  M 

A  host  of  techniq'  es  are  available  for  estimating  the  AR  parameters  from 
data,  samples.  The  most  obvious  approach  is  to  first  make  estimates  of  the 
autocorrelation  lags  with  the  available  data,  and  then  to  apply  the  usual 
Levinson  recursion  with  the  estimated  lags.  This  approach  is  rarely  used 
primarily  due  to  the  fact  that  better  resolution  may  be  obtained  with  other 
estimation  methods  that  obtain  the  AR  parameters  directly  from  the  data.  If 
unbiased  autocorrelation  estimates  are  used,  one  may  also  run  into  numerical 
ill-conditioning  during  the  solution  of  the  normal  equations.  Biased  auto¬ 
correlation  estimates  reduce  the  risk  of  ill-conditioning,  but  at  the  expense 


51 


of  a  degradation  of  the  AR  spectral  resolution  and  a  shifting  of  spectral  peaks 
from  their  true  locations  [1] .  The  shift  effect  is  termed  a  frequency  estimat- 
tion  bias.  Another  reason  that  has  made  this  a  seldom  used  technique  is  the 
problem  of  spectral  line  splitting.  Spectral  line  splitting  is  the  occurrence 
of  two  or  more  closely-spaced  peaks  in  an  AR  spectral  estimate  when  only  one 
spectral  peak  should  be  present.  The  reasons  for  spectral  line  splitting  in 
the  Yule-Walker  technique  has  been  documented  by  Kay  and  Marple  [3], 

The  most  popular  approach  for  AR  parameter  estimation  is  the  Burg  algo¬ 
rithm  [4,5],  This  algorithm  utilizes  a  constrained  least  squares  estimation 
procedure  to  obtain  the  M  autoregressive  parameter  estimates  from  N  data  sam¬ 
ples.  The  constraint  requires  the  AR  parameter  estimates  to  satisfy  the  Levin¬ 
son  recursion.  The  Burg  algorithm  requires  computational  operations  propor¬ 
tional  to  the  product  NM. 

AR  spectral  estimates  based  on  the  Burg  algorithm  suffer  from  two  of  the 
same  problems  observed  in  Yule-Walker  estimates  of  the  AR  spectrum.  The  prob¬ 
lem  of  spectral  line  splitting  in  AR  spectra  produced  by  the  Burg  algorithm 
was  first  documented  by  Fougere  et.al.  [6],  They  noted  that  spectral  line 
splitting  was  most  likely  to  occur  when:  (1)  the  SNR  is  high,  (2)  the  initial 
phase  of  sinusoidal  components  is  some  odd  multiple  of  45°,  (3)  the  time  dura¬ 
tion  of  the  data  sequence  is  such  that  sinusoidal  components  have  an  odd  number 
of  quarter  cycles,  and  (4)  the  number  of  AR  parameters  estimated  is  a  large 
percentage  of  the  number  of  data  values  used  for  the  estimation.  Many  spurious 
spectral  peaks  often  accompany  spectra  that  exhibit  line  splitting. 

The  connection  between  line  splitting  and  the  number  of  AR  parameters  est¬ 
imated  (model  order)  highlights  a  problem  area  common  to  all  methods  of  AR 
spectrum  analysis  --  how  to  select  the  AR  model  order.  Akaike  [7]  has  sugges¬ 
ted  two  popular  criteria  for  order  determination.  However,  this  author's  ex¬ 
perience  has  shown  that  most  order  selection  rules,  including  Akaike' s,  are  not 
enough  to  be  effective  against  the  line  splitting  phenomenom. 

A  second  major  problem  area  with  the  Burg  algorithm,  as  with  the  Yule- 
Walker  case,  is  the  bias  in  the  positioning  of  spectral  peaks  with  respect  to 
the  true  frequency  location  of  those  peaks.  If  one  defines  the  foldover  fre¬ 
quency  as  f  »l/2At  ,  where  At  is  the  sample  rate,  then  it  has  been  observed 
in  real- valued  data  that  spectral  peaks  with  fractional  frequencies  from  0  to 
.  5f  tend  to  be  biased  higher  in  frequency  than  their  actual  values.  Those 
peals  with  fractional  frequencies  from  ,5f  to  l.Of  tend  to  be  biased  lower 
in  frequency  than  their  actual  values.  Swfngler  [8?  has  shown  that  the  bias 
can  pull  the  peak  off  frequency  by  as  much  as  16%  of  a  resolution  cell  when 
using  the  Burg  algorithm. 

In  order  to  alleviate  the  spectral  line  splitting  problem,  Fougere  [9] 
devised  a  rather  complicated  gradient  descent  algorithm  for  AR  parameter  esti¬ 
mation.  The  algorithm  has  been  shown  to  work  for  selected  one  and  two  sinusoid 
examples,  but  it  is  an  iterative  procedure  that  requires  a  much  higher  compu¬ 
tational  effort  than  the  popular  Burg  algorithm. 

This  paper  presents  a  new  algorithm  for  AR  parameter  estimation  that 
yields  AR  spectra  with  no  apparent  line  splitting  and  reduced  spectral  peak 
frequency  estimation  biases.  A  set  of  sensitive  stopping  rules  for  order  se¬ 
lection  has  been  found  for  the  algorithm.  The  method  is  based  on  an  uncon¬ 
strained  least  squares  estimation  of  the  AR  parameters  first  proposed  by  Ulrych 
and  Clayton  [10],  who  termed  it  the  least  squares  (LS)  AR  spectral  estimate. 

In  their  experiments  with  the  LS  estimate,  they  observed,  for  processes  with 
one  and  two  sinusoids  in  noise,  that  LS  generated  spectra  had  less  variation  of 


52 


the  spectral  peaks  from  their  actual  frequncies  as  a  function  of  initial  phase 
than  Burg  algorithm  spectra.  Nuttall  [11]  compared  the  LS  spectral  estimate 
(which  he  termed  the  forward  and  backward  prediction  method)  to  other  AR  spec¬ 
tral  estimates,  including  the  Burg  estimate,  for  a  large  ensemble  of  sampled 
AR  processes.  He  found  the  LS  estimate  to  be  as  good  as,  and  often  better 
than,  the  other  estimators.  In  fact,  among  all  AR  estimation  techniques  exam¬ 
ined  by  Nuttall,  the  LS  method  exhibited  the  least  variation  in  frequency. 

A  straightforward  matrix  solution  of  the  linear  simultaneous  normal  equa¬ 
tions  for  the  LS  method  of  AR  parameter  estimation  has  been  the  usual  computa¬ 
tional  approach.  This  requires  a  number  of  computational  operations  propor¬ 
tional  to  NM4",  making  it  computationally  unattractive  relative  to  the  more 
efficient  Burg  algorithm.  This  paper  presents  an  algorithm  for  solution  of 
the  LS  equations  with  a  computational  convexity  proportional  to  NM,  making  it 
comparable  to  that  of  the  Burg  algorithm. 


Burg  Algorithm  Estimate  of  the  AR  Spectrum 


The  popular  approach  for  AR  parameter  estimation  with  data  samples  was 
introduced  by  John  Burg  in  1968.  The  Burg  algorithm  may  be  viewed  as  a  con¬ 
strained  least  squares  estimation  procedure.  Assuming  an  all-pole  stationary 
stochastic  process,  the  forward  linear  prediction  error  is  give  by 


M,  t 


t+M 


M  M 

+  ^aM,kXt+M-k  =  ^ 


aM,kXt+M-k 


k=l 


k=0 


(1) 


for  1  -  t  —  N-M  and  the  backward  linear  prediction  error  is  given  by 


M 


M 


4.  =  X  + 
M,  t  t 


E*  V”1  * 

aM,kXt+k  =  /  ,  aM,kXt+k 


(2) 


k=l 


k=0 


also  for  lit— N-M.  Note  that  complex-valued  data  is  assumed,  aQ  is  defined 
as  unity,  the  ^  are  the  AR  parameters  at  order  M,  and  the  x^.  are  the  data 
samples.  ' 

To  obtain  estimates  of  the  AR  parameters,  Burg  minimized  the  sum  of  the 
forward  and  backward  prediction  error  energies, 


N-M  N-M 

Z*  v  * 

f  f  +  >  b  b 

M,  t  M,t  M,  t  M,1 


(3) 


t=l  t=l 

subject  to  the  constraint  that  the  AR  parameters  satisfy  the  Levinson  recursion 


aM,k  aM-l,k  +  aM,M  aM-l,M-k  (4) 

for  all  orders  from  1  to  M.  This  constraint  was  motivated  by  Burg's  desire  to 
have  a  stable  AR  filter  (poles  within  the  unit  circle) .  Figure  1  is  a  flow- 


53 


chart  of  the  Burg  algorithm/  based  on  a  modification  by  Anderson  [12]  of  the 
original  Burg  algorithm.  A  computational  complexity  analysis  of  the  modified 
Burg  algorithm  indicates  that  3NM-J4  -2N-M  complex  adds ,  3NM-M  -N+3M  complex 
multiplications,  and  M  real  divisions  are  required.  Storage  of  3N+M+2  complex 
words  is  also  required. 


Marple  Least  Squares  Algorithm  Estimate  of  the  AR  Spectrum 


A  recursive  algorithm  has  been  found  by  this  author  [13]  for  the  exact 
least  squares  solution  of  the  AR  parameter  estimates  using  forward  and 
backward  linear  prediction.  The  algorithm  flowchart  is  shown  in  Figure  2,  al¬ 
though  no  proof  is  provided  here. 

To  obtain  the  M  normal  equations  for  the  LS  algorithm,  substitute  (1)  and 

(2)  into  (3)  and  determine  the  minimum  of  p  by  setting  the  derivatives  of  eM 

with  respect  to  all  the  AR  parameters  aw  ,  through  aw  to  zero.  This  yields 

M,1  M,M 


da, 


'M,i 


M 


I 

j=0 


aM  ^rw(i/j) 
M,  j  M 


=  0 


(5) 


for  i=l , 


,M  ,  where  a 

M,  0 


1  by  definition,  and 


N-M 

*  * 

(x,  ...  .X,  JW  .  +.  X,  ,  .X,  .  .) 

k+M~3  k+M-i  k+i  k+3 

k=l 


(6) 


for  Oii/jiM.  The  minimum  prediction  error  energy  may  be  determined  to  be 


eM 


M 


-z 


aM 

M,  j  M 


3=0 


(7) 


Expressions  (5)  and  (7)  can  be  combined  into  a  single  (M+l)  by  (M+l)  matrix 
expression 

Vm  “  EM  (8) 

where 


am 

1 

aM,l 

II 

ws 

V 

'  *m  = 

r  (0,0)  .  .  .  r  (0,M) 

M  M 

.  aM,M. 

_0 

rM(M,0)  .  .  .  rM(M,M) 

Ulrych  and  Clayton  [10]  were  the  first  to  propose  the  least  squares  relation- 


54 


ship  (5)  for  AR  parameter  estimation,  in  which  the  Levinson  recursion-  constraint 
has  been  removed.  They  found  the  LS  estimates  by  computing  the  r^Cijj)  terms 
directly  and  then  by  solving  (8)  for  vector  A  by  matrix  inversion.  This  re¬ 
quires  on  the  order  of  M  computational  operations,  which  places  jt  at  a  com¬ 
putational  disadvantage  with  respect  to  the  Burg  algorithm  with  M  operations. 

Expression  (9),  though,  has  a  structure  that  can  be  exploited  to  generate 
an  algorithm  of  order  M  operations.  Although  the  details  are  not  presented 
here,  the  algorithm  was  motivated  by  a  similar  algorithm  developed  by  Morf  et. 
al.  [14].  Examination Aof  R  will  show  that  this  matrix  has  both  hermitian 
symmetry  [  r  (i,j)  =  rM(j,iy  ]  and  hermitian  persymmetry  [  r^ ( i , j )  = 
rM(M-i,M-j)  ].  It  is  not  Toeplitz,  although  it  may  be  decomposed  into  a  func¬ 
tion  of  the  Toeplitz  matrix  T  , 

M 

\  ’  ‘VX  +  '  (10> 


where 


XM+1  XM  *  *  '  X1 
XM+2  XM+1  '  ‘  *  X2 


M 


X2M+1  X2M  *  *  ‘  XM+1 


XN  XN-1  *  *  XN-M 


(ID 


with  T* 
M 


denoting  the  conjugated  and  reversed  matrix 


* 

XM+1 


Vm  •  •  -x» 


(12) 


and  *  denoting  the  complex  conjugate  transpose  operation.  Thus,  R^  has  a 
structure  composed  of  the  sum  of  two  products  of  Toeplitz  data  matrices.  2It 
is  this  underlying  structure  that  allows  a  recursive  algorithm  of  order  M 
operations  to  be  generated.  2 

The  LS  algorithm  requires  NM+8M  +N+7M-8  complex  additions,  NM+9M  +N+25M 
-3  complex  multiplications,  and  16M-4  real  divisions.  The  LS  algorithm 
needs  N+4M+15  complex-valued  computer  memory  locations.  As  a  typical  case, 
consider  N=100  samples  from  an  M=30  order  AR  process.  The  total  number  of 
multiplications,  adds,  divisions,  and  storage  locations  for  the  Burg  algorithm 
are  8181,  8059,  30,  and  335  respectively.  For  the  LS  algorithm,  the  numbers 


55 


are  11947,  10402,  476,  and  235,  which  is  quite  comparable  to  that  required  for 
the  Burg  algorithm. 

Appendix  A  contains  a  FORTRAN  subroutine  for  computation  of  the  AR  para¬ 
meters  via  the  LS  algorithm.  The  computer  version  of  the  algorithm  contains 
several  simple  checks  for  both  numerical  ill-conditioning  and  order  selection 
indication.  The  key  terms  that  are  performance  indicators  of  the  algorithm 
are  the  error  energy  e  and  a  divisor  term  called  DENOM(M),  which  is  the  only 
divide  term  in  the  whole  algorithm,  changes  in  these  have  been  empirically 
found  to  be  sensitive  indicators  of  proper  order  selection  when  using  the  LS 
algorithm. 

Performance  of  the  LS  Algorithm 

A  distinct  difference  in  performance  between  the  LS  and  Burg  algorithms  is 
illustrated  by  the  spectra  in  Figure  3.  A  4i-point  sample  sequence  was  gener¬ 
ated,  consisting  of  three  sinusoidal  components  at  fractional  frequencies  of 
. 3155f  ,  .5155f  ,  and  .7655f  .  All  sinusoids  had  initial  phases  of  45°.  A 

gaussian  white  noise  sequence  was  generated  and  the  sinusoid  amplitudes  were 
selected  to  yield  SNRs  of  43  dB,  37  dB,  and  37  dB  respectively.  The  frequen¬ 
cies,  initial  phases,  SNRs,  and  data  segment  length  of  41  samples  were  selected 
based  on  conditions  established  by  Fougere  [6]  as  being  the  most  likely  to  pro¬ 
duce  spectral  line  splitting. 

Using  the  Final  Prediction  Error  (FPE)  criterion  of  Akaike  [7]  as  the  rule 
for  order  selection,  the  minimum  FPE  of  the  41-point  sequence  with  the  Burg 
algorithm  was  found  at  order  23.  The  AR  spectrum  based  on  the  23  AR  parameters 
estimated  by  the  Burg  algorithm  is  shown  in  Figure  3a.  Extreme  line  splitting 
occurs  at  each  of  the  three  peaks  of  interest.  In  addition,  many  spurious  low 
level  peaks  are  apparent  in  the  spectrum.  This  illustrates  the  erroneous  re¬ 
sults  that  may  occur  when  an  improper  order  for  the  AR  estimate  is  selected. 
Using  the  same  sample  sequence,  the  LS  algorithm  selected  order  7,  based  on 
the  rules  of  order  selection  discussed  in  Appendix  A.  The  AR  spectrum  of  the 
LS- algorithm-estimated  AR  parameters  is  shown  in  Figure  3b.  The  spectrum  has 
three  sharp  peaks  at  the  correct  frequencies  with  no  spectral  line  splitting. 
For  comparison,  an  AR  spectrum  using  the  Burg  Algorithm  for  order  7  is  shown 
in  Figure  3c.  There  is  no  apparent  line  splitting,  illustrating  the  necessity 
for  proper  order  selection.  Comparing  the  spectra  of  Figures  3b  and  3c,  it 
may  be  seen  that  the  skirts  for  each  spectral  peak  are  more  narrow  for  the  LS 
spectrum  than  for  the  Burg  spectrum.  This  shows  that  the  poles  have  moved 
closer  to  the  unit  circle  with  the  LS  approach  than  with  the  Burg  approach. 

Ulrych  and  Clayton  [10]  have  examined  the  sensitivity  of  the  Burg  and  LS 
spectra  to  the  initial  phase  of  processes  consisting  of  one  or  two  sinusoids 
in  noise.  They  found  that  the  LS  estimate  is  fairly  insensitive  to  the  initial 
phase  and  yields  an  accurate  determination  cf  the  sinusoid  frequency,  whereas 
the  Burg  estimate  had  severe  variance  in  the  frequency  location  of  the  spectral 
peak  as  a  function  of  initial  phase. 

Nuttall  [11]  has  examined  the  performance  of  the  LS  approach  for  a  non- 
sinusoidal  process.  He  generated  real-valued  sequences  from  the  fourth  order 
AR  process 

4 

x,  =  /  a  x.  +  w  (13) 

x  n  k-n  k 

n=l 


56 


where  a^=2.7607,  a  =-3.8106,  a3=2.6535,  and  a^=-0.9238.  White  gaussian  noise 
w^  was  added  to  tne  process  to  yield  a  10  dB  SNR.  One  hundred  Burg  and  LS 
spectra  were  generated  from  independent  40-sample  sequences  of  this  AR  process 
in  steady  state.  Figures  4a  and  4b  show  the  overlapped  spectra  for  the  two 
algorithms,  while  Figures  4c  and  4d  indicate  the  average  of  all  100  spectra  for 
each  algorithm  compared  to  the  true  AR  spectrum.  The  model  order  was  preselec¬ 
ted  at  M=4.  One  observation  that  can  be  made  is  that  the  LS  technique  tends  to 
have  less  variability  in  the  skirts,  but  more  spiky  estimates  near  the  peaks 
of  the  spectrum,  than  seen  in  Burg  algorithm  spectra.  That  is,  the  LS  algo¬ 
rithm  produces  AR  spectra  with  less  frequency  variability,  but  more  power 
spectral  density  variability.  The  greater  PSD  variability  can  be  attributed 
to  the  fact  that,  unlike  the  Burg  algorithm,  the  LS  algorithm  does  not  restrict 
the  poles  from  moving  close  to  the  unit  circle.  Since  the  area  under  the 
spectral  density  curve,  rather  than  the  peak  height,  is  proportional  to  power, 
the  variability  in  PSD  amplitude  is  not  of  much  concern.  Rather,  to  obtain 
unbiased,  accurate  estimates  of  the  spectral  peak  frequencies  is  more  important 
for  most  applications. 

No  cases  of  spectral  line  splitting  have  been  observed  using  the  order 
selection  criteria  given  in  Appendix  A.  In  practice,  then,  the  LS  algorithm 
appears  to  yield  AR  parameters  that  produce  stable  spectra,  even  when  pole 
estimates  fall  outside  the  unit  circle  (less  than  1%  of  the  time). 

Summary 


A  new  recursive  algorithm  that  provides  AR  parameters  for  an  AR  spectral 
estimate  based  on  forward  and  backward  linear  prediction  has  been  introduced. 

It  has  the  same  order  of  computational  complexity  as  the  popular  Burg  algorithm. 
Examples  have  been  provided  to  illustrate  the  improved  performance  of  spectra 
generated  with  the  LS  algorithm  when  compared  to  spectra  generated  with  the 
Burg  algorithm.  Improvements  include  reduced  sensitivity  to  initial  phase, 
reduced  bias  in  the  frequency  estimate,  less  frequency  variability  over  an 
ensemble  of  spectra  made  from  the  same  process,  and  absence  of  spectral  line 
splitting.  All  these  factors  suggest  that  the  LS  algorithm  is  an  attractive 
alternative  to  the  Burg  algorithm  for  AR  spectral  estimation. 


References 


[1]  Marple,  S.L.  Jr.,  1978,  “Frequency  Resolution  of  High  Resolution  Spectrum 
Analysis  Techniques,"  in  Proc.  First  RADC  Spectrum  Estimation  Workshop 
pp. 19-35 . 

[2]  Ulrych,  T.J.  and  Bishop,  T.N.,  1975,  "Maximum  Entropy  Spectral  Analysis 
and  Autoregressive  Decomposition,"  Rev .Geophysics ,  vo.1.13,  pp. 183-200. 

[3]  Kay,  S.  and  Marple,  S.L.  Jr.,  1979,  "Sources  of  and  Remedies  for  Spectral 
Line  Splitting  in  Autoregressive  Spectrum  Analysis,"  in  Record  IEEE  ICASSP, 
pp.  151-154. 

[41  Anderson,  N.O. ,  1974,  "On  the  Calculation  of  Filter  Coefficients  for  Maxi¬ 
mum  Entropy  Analysis,"  Geophysics ,  vol.  39,  pp.  69-72. 


57 


[5]  Burg,  J.P.,  Maximum  Entropy  Spectral  Analysis,  1975,  PhD  Thesis,  Dept,  of 
Geophysics,  Stanford  University. 

[6]  Fougere,  P.F.,  Zawalick,  E.J.,  Radoski,  H.R. ,  1976,  "Spontaneous  Line 
Splitting  in  Maximum  Entropy  Power  Spectrum  Analysis,"  Physics  Earth  and 
Planetary  Interiors ,  vol.  12,  pp.  201-207. 

[7]  Akaike,  H. ,  1970,  "Statistical  Predictor  Identification,"  Ann. Inst. Stat. 
Math.,  vol.  22,  pp.  203-217. 

[8]  Swingler,  D.N.,  1979,  "A  Comparison  Between  Burg's  Maximum  Entropy  Method 
and  a  Nonrecursive  Technique  for  the  Spectral  Analysis  of  Deterministic 
Signals,"  Jour .Geo .Res .  ,  vol.  84,  pp.  679-685. 

[9]  Fougere,  P.F.,  1977,  "A  Solution  to  the  Problem  of  Spontaneous  Line  Split¬ 
ting  in  Maximum  Entropy  Power  Spectrum  Analysis,"  Jour .Geo. Res . ,  vol.  82, 
pp.  1051-1054. 

[10]  Ulrych,  T.J.  and  Clayton,  R.W.,  1976,  "Time  Series  Modelling  and  Maximum 
Entropy,"  Physics  Earth  and  Planetary  Interiors ,  vol. 12,  pp.  188-200. 

[11]  Nuttall,  A.H.,  26  March  1976,  "Spectral  Analysis  of  a  Univariate  Process 
with  Bad  Data  Points,  Via  Maximum  Entropy  and  Linear  Predictive  Techniques 
Naval  Underseas  Systems  Command,  TR  5303. 

[12]  Anderson,  N.O.,  1978,  "Comments  on  the  Performance  of  Maximum  Entropy 
Algorithms.,..'^  Proc.IEEE,  vol.  66,  pp.  1581-1582. 

[13]  Marple,  S.L.  Jr.,  paper  submitted  for  publication  to  IEEE  Trans. ASSP. 

[14]  Morf,  M. ,  Dickinson,  B. ,  Kailath,  T. ,  Vieira,  A.,  1977,  "Efficient  Solu¬ 
tion  of  Covariance  Equations  for  Linear  Prediction,"  IEEE  Trans .Acoustics , 
Speech,  &  Sig.Proc.,  vol.  ASSP-25,  pp.  429-433. 


58 


: '  i  M  A''  liWfliftV'i-ri'- 


;  .•»,  ,,  *  a> 

; V:;  : ,v;  ■ 


*5 


Figure  5  is  a  FORTRAN  subroutine  listing  for  implementation  of  the  LS 
algorithm  with  complex- valued  data.  The  subroutine  is  dimensioned  to  accept 
512  data  values  and  is  fixed  to  compute  a  maximum  of  50  AR  parameters.  Note 
that  arrays  C  and  D  must  be  dimensioned  by  one  more  than  the  number  of  maximum 
AR  coefficients. 

The  following  input  parameters  are  passed  to  the  subroutine: 

X  =  Array  of  complex-valued  data  samples 
N  =  Number  of  data  samples  in  array  X 

MMAX  =  Maximum  number  of  AR  parameters  to  be  estimated 

TOL  =  Tolerance  value  for  two  of  the  stopping  criteria.  Empirically 

-3  -4 

set  to  10  for  minicomputer  implementation  and  to  10  for 

large  scale  computer  implementation. 

The  following  output  parameters  are  passed  from  the  subroutine: 

M  =  Number  of  AR  parameters  computed  when  a  stopping  criteria  was 

satisfied;  note  that  Mi  MMAX. 

A  =  Array  of  complex-valued  AR  parameters 

P  =  Prediction  error  energy  (ew  in  text) 

M 

ENERGY  =  Twice  the  total  signal  energy  in  the  data  samples  (Eq.101) 

STATUS  =  Integer  indicating  stopping  criteria  that  terminated  the 
recursion  at  order  M. 

Five  values  of  STATUS  are  possible.  STATUS=1  is  the  normal  program  exit 

when  the  maximum  order  is  reached,  M  =  MMAX.  STATUS=2  indicates  the  program 

terminated  when  e,  /ENERGY <  TOL,  that  is,  the  residual  prediction  error  energy 
M 

is  a  small  fraction  of  the  total  signal  energy.  STATUS=3  indicates  the  pro¬ 
gram  terminated  when  (ew  ,  -  e„)/e„  <  TOL,  that  is-,  the  residual  prediction 
error  energy  at  order  M  has  changed  by  only  a  small  fraction  from  the  pre¬ 
vious  order  M-l.  This  is  the  stopping  criteria  encountered  most  frequently. 
STATUS— 4  occurs  when  e^<0,  indicating  numerical  ill-conditioning  or  possibly 
a  singular  matrix.'  This  was  the  stopping  criteria  encountered  in  Figure  4b 
far  M=8.  As  a  result  of  this  condition,  the  order  M=7  was  selected  for  valid 
AR  parameter  estimates.  STATUS=5  indicates  the  algorithm  terminated  when 
DANQM(M)  SO.  This  is  also  an  indicator  of  numerical  ill-conditioning  within 
the  algorithm,  since  DENOM  must  be  positive- valued. 

59 


.  «■»  Si, 


START 


ra  =  °  N 

eo '  2jKI 

DEB0  -  eQ 

q  =  i 

f0,k  =  b0,k 


for  k=l, . . . »N 


Initialization 


m  =  m  +  1 


tt-"  * 

=  2j  ^  , 

V*«l  111 


k  m-l/k+1 


DEU^  -  "(DEN^l  (q)  -  |b„.lfl,.„+1j  - 


a  =  -2  NUM/DEN 
m,m  .j  mi 

«  "  1  -  lam,ml 

em  ”  «Vl>  W 


a  *  a  4-  a  a  , 

m,n  m-l,n  m,m  nv-l,m-n 

for  n=l ,  •  •  •  frn-l 


Levinson 

Recursion 


m-l'fk+l  m,m  m-l,k 

* 

+  a  f 

m-l,k  m,m  m-l,k+l 
for  k=l, . . . /N-m 


Order 


FIGURE  1.  Burg  Algorithm  Flowchart. 


FIGURE  2.  Marple  Least  Squares 


Forward  and  Backward  Linear  Prediction  Algorithm 


TIME 

SHIFT 

UPDATE 


GIVEN:  Data  Sequence  X^,...,XN 


SET  UP 

INITIAL 

CONDITIONS 


e  =  energy  -  2 


k=l 

C  *  X  ‘/ENERGY 

0,0  1 

d  «  X  /ENERGY 

0,0  N 

f0.1  -  X1 

V.,* * 

go-|X1l2/ENERGY 


I  IV 


h  -X.  Xw  /ENERGY 

o  1  N 

W  - \ XN | VENERGY 
S  -X^'/ENERGY 

^"x  nxn/energy 

V  »X, *X, ‘/ENERGY 
Oil 


•denotes 

complex 

conjugate 


denom  -  d-g„,)  "  Ihn 


m,k 


r  li,,|2U-wra)  F|b.l2(l-gm)  +  2 

e  DENOM  J 

L  m 

e  •  *  Qe 

in  ro 

a  [a  V  c  *  J 

m,k  ^  DENOM  j  m,k  \  DENOM  ni,kj 


-1 


for  k  *  1. ■ . - i" 


^4(V_iV*"nV)  c.  m  l  d 


C’  .  “  C 
m(i  n:# 


nt,i  m,i 


for  i  °  0, . .  .  ,m 


ORDER 

UPDmTE 


in-in+1 

N-m 

rn„l  “  2  kS  Xk  +  ™Xk 


f  .  m  t  -X  X*  -X  X* 

m,i  N-i+1  N-m+1  i 

for  i~l , « . . ,m-l 

m-1 

^  "m,!4  £  r™a’- 

i=1  a„,„  *  -2i„/cL, 


R>  m-l,i 


am,k  “  a  in-1 ,  k+am,m^a  m-l(m-k' 

e  =  e1  ,  -  ]  a  1 2/e  ■  . 

in  m-1  m'  in-1 


for  k=l , . . . ,m~l 


T 


£ 


61 


BEMSITf 


FIGURE  5.  FORTRAN  Program  of  LS  Algorithm. 


SUBROUTINE  LSTSQS  ( N , M , MMAX , X , A , P , TOL , STATUS , ENERGY ) 


COMPLEX  A  (50)  , 
COMPLEX  C  (51)  , 
COMPLEX  SAVE1, 
INTEGER  STATUS 


X( 512) 

D(51)  ,CORR(  50)  ,  E,F,H,S,U,V 
SAVE2, DELTA, Cl, C2,C3,C4,C5,C6 


C 

C  INITIALIZATION  SECTION  (OTH  ORDER) 
C 

€NERGY  =  fl  . 

DO  10  1  =  1  .N 

10  EN£fiGY=ENERG Y+CABS(X (I)  ) **2 

ENERGY=2,*ENERGy 
P=E  NERGY 


E  =  X  (1) 

F  =  C  ON JG ( X  ( N ) ) 

H=  ( C  ONJG  (  X  ( 1 )  )  +CONJC-(X(N>)) /ENERGY 
S= ( X ( N )  +  C  ON JG (>(1)))/EKEPGY 
V=(C0NJG(X(1) ) *CCNJG(X(1))) /ENERGY 
U  =  (  X  ( N  )  +  X (N) )/ENERGY 
G=C ABS (X ( 1) ) **  2/ ENERGY 
R=CA3S  (X  (N)  )  **2/  ENERGY 
C(1)=CCNJG(X( 1 )) /ENERGY 
D<1  )=X (N) /ENERGY 
M=  0 
C 

C  TIFE-S^IFTEC  VARIABLES  UPCATE 
C 

1000  POL  C=P 

DEN0M=(i.-G)+ (l.-R)-CABS(H) **Z 


IF  (DENOM  .NE .  0 )  GO  TO  20 

STATUS=5 

RETURN 

20  Cl=H*E*CONJG  (F  ) 


1 


R1=2.*REAL(C1  ) 

R2=  (1  •  -  R )  ♦CAPS  (E  )  *  *2-M  1  .  -  G)  *C  A  3S  (F)  ^2+R.l 
ALPHA=1./(1.+Pc/  <DEN0M*P)> 

P=ALFHA*P 

Cl  =  (EMI.  -R)  +  F  *  C  ON  J  G  (H  )  )/ DENOM 
C  2=  (F*(l.-G)  +  H*E)/DENOM 
C  3=  (H  +  S+VM1.-R)  )  /  DENOM 
C4  =  (V+CONJGtH)  +S  *  (1  .  -G  )  )  /  CE  NOM 
C 5=  (H*U+S*(1.  -R)  )  /  DENOM 
Cc=  <S*CONJG(H>  +  UM1.-G))/ DENOM 
IF  (M  .EQ.  0)  GO  TO  40 
00  30  1=1, M 
I  1  =  1  +  1 

30  A  (I)  =  ALPHA* ( A  ( I )  +  C1*C ( 1 1 )  +C  2  *  D  ( ID  ) 

40  M2= M/2+1 

DO  50  1=1, M2 
MI=M+2- I 


SAVE1  =  CPNJG  <  C « I)  > 

C  (I)  =C^:n  +  C3*CCNJG<C  <  MI)  )  +C  4  *  CC  NJG  (DjjJI )) 

D  tl)  =D  ( I)  +C5*CCNJG(C (MI ) ) +C£*CCNJG  <D(MI )> 
C (MI) =C (MI ) +C3  *  S  A  VE1+C4+SAV  E2 
D (MI) =D (MI } +C5*SA VE1+C6*S AV52 
CONTINUE 


ORDER  UPDATE 
M=M  +  1 

DELTA= (0 . ,0 . ) 

Ic  CM  ,EQ.  1)  GO  TO  70 
DO  GO  J J  =  2  i  M 

CORR  f  jt  i)  =  C0RP(J)-X<N-J+1)*C0NJG(X(N-M+-1))-X(M)*C0NJG(X(J)) 
DELI  A  =  DELTA  +  C0RR<J+-1)*A  (J) 

Cl=  (0. ,0  .  ) 


NM=  N-M 

DO  flO  K= 1 , NM 

C1=C1+X (K+M) *CCNJG(X(K) ) 


THIS  PAG®  IS  BEST  QUALITY  PRACTICABLE 
PHQid  COPY  FUBINI5HHD  TO  HOC  -  -  -  - - 


C0RR(1)=2.*C1 

DEL  TA=DELTA+CORR  (1) 

Cl=  CELTA/P 
A (M)=-Ci 
MOLC=P 

P=HCLQ-CABS (DELTA) **2/ HOLD 
IF  (M  .  E  Q.  1)  GO  TO  100 
M2=K/2 

DO  90  1= 1 , M2 
M  I  =  M-I 

SAVEl=CONJG (  A(  I )  ) 

A (I)=A (I) -Cl *CCNJG(A  (MI) ) 

IF  (I  .EQ.  MI)  GC  TO  90 
A  (MI)  =  A  (MI  )  -Cl*SAVEi 
CONTINUE 

PREDICTION  COEFFICIENTS  UPDATE 

I  E  =  X ( M+l ) 

F=CONJG (X iN-M)  ) 

DO  110  1  =  1,  M 
M  I=M+ 1  -  I 
E=E+X(MI) *  A  ( I) 

I  F=F+CONJG(X (N-M+I))*A(I) 

AUXILIARY  VARIABLES  CRCER  LF C ATE 

C 1=  CON JG  <  E ) /P 
C2=CGNJG (F) /P 
DO  120  11  =  1,  M 

I  =  M- 1 1  + 1 

I I  =  1  +  1 

C(I1)=C(I)+C1*A(I> 

)  DlIl)=D(I)+CZ*A(l) 

C(1)=C1 
D  ( 1 )  =C2 


SCALARS  update 

Cl=H*S*CONJG (V ) 

C?=U*H*CONJG(S) 

Ri=2.*REAL (Cl ) 

RZ=2.*REAL (C?) 

Rl=(l.-R)»CABS(V)J»#2+(l.-G)*CABS(S)»-*2  +  Rl 
R2=(1.-G)*CA6S(U)**2+<1.-R)*CA9S(S>**24R2 
G=G+CRl/OENOM) +(CABS(E) **2/P) 
R=R+(R2/0ENQM)  -MCAES(F)**2/P) 

H=  (  0  .  ,  0  .  ) 

S= ( 0. ,  0.  ) 

V-( 0. , 0.  ) 

U=  <0. , 0.  ) 

Ml  =  M* 1 

00  130  1=1, Ml 

H  =  H4-CONJG  (X  (  N-M-l+I)  )  *C  ( I ) 

S  =  S*X ( N+i- I )  *C  (I) 

V= V+CON JG ( X  (I)  )*C (I) 

I  U=U+X(N+1-I) *  D  ( I ) 

CHECK  FOR  STOPPING  CRITERIA 

IF  (P  .GT.  0)  GO  TO  200 


IF  (P  .GT.  0)  GO  TO  200 

S T A  T US  =  4 

RETURN 

0  IF  (  (  (POLD-P) /POID)  .GT.  TOL) 
STATUS=3 
RETURN 

0  IF  (IP/ENERGY)  .GT.  TCL )  GO 
ST  A  TUS  =  2 
RETURN 

220  IF  (M  .NE.  MMAX)  GC  TO  1000 
S  T  A  TU  S= 1 
c£  T URN 
END 


GO  TO  210 


GO  TC  220 


THIS  PAGE  IS  BEST  QUALITY  PBA'CJIGABIJI 
ERIOiM  COPY  PUK1NISI1ED  TO  DOC 


ARMA  SPECTRAL  ESTIMATION:  AN  ITERATIVE  PROCEDURE 


James  A.  Cadzow 

Department  of  Electrical  Engineering 
Virginia  Polytechnic  Institute  and  State  University 
Blacksburg,  Virginia  24061 
(703)  961-5694 


ABSTRACT 

An  ARMA  Autocorrelation  Estimation  Method  (AEM)  for  generating  the 
best  rational  spectral  estimate  of  a  stationary  random  discrete-time  series 
is  presented.  This  estimation  is  to  be  based  on  N  contiguous  observations 
of  the  infinite  length  time-series.  As  in  the  maximum  entropy  method,  the 
AEM  in  effect  extrapolates  an  autocorrelation  estimate  beyond  the  data 
limited  range  with  the  explicit  objective  of  achieving  improved  spectral 
resolution.  Unlike  the  maximum  entropy  method,  however,  the  AEM  spectral 
estimator  provides  for  the  existence  of  zeroes  as  well  as  poles  in  the  re¬ 
sultant  power  spectral  density  and  it  is  thereby  more  robust  in  nature. 
Furthermore,  this  method  does  not  require  an  excessively  large  number  of 
observations  to  be  effective,  a  property  not  shared  by  most  other  rational 
ARMA  spectrum  estimators. 


This  work  was  supported  in  part  by  the  Signal  Processing  Section, 
Surveillance  Technology  Branch,  Rome  Air  Development  Center  through  the 
Post  Doctoral  Program  under  Contract  F30602-75-0018. 


67 


I.  INTRODUCTION 


In  a  variety  of  applications,  it  is  desired  to  identify  the  spec¬ 
tral  characteristics  of  a  wide-sense  stationary  discrete-time  process 
(x(n)}.  The  elements  of  this  random  process  may  arise  naturally  through  a 
discrete-time  phenomenon,  or,  by  means  of  uniformly  sampling  a  wide-sense 
stationary  continuous-time  process.  Whatever  the  case,  the  processes' 
spectral  characterization  is  completely  determined  by  its  corresponding 
autocorrelation  sequence 

rx(n)  =  E{x(k)  x(k  +  n) }  n=0,  ±1,  ±2,  ...  (1) 

in  which  E  denotes  the  expected  value  operator.  The  z-transform  of  this 
sequence  is  commonly  referred  to  as  the  "power  spectral  density"  associated 
with  the  process  and  is  specified  by 

00 

S  (z)  »  Z  r  (n)z  n  (2) 

x  x 

n=-oo 

where  z  is  a  complex  valued  variable.  In _ the  spectral  estimation  literature, 
one  commonly  replaces  the  z  variable  by  e^1*5  to  obtain  the  equivalent  Fourier 
transform  characterization  as  denoted  by  Sx(e^u) .  With  this  Fourier  repre¬ 
sentation,  one  is  able  to  express  the  spectral  characterization  as  a  func¬ 
tion  of  the  real  frequency  variable  u. 

Although  relationship  (2)  yields  an  explicit  procedure  for  deter¬ 
mining  the  power  spectral  density,  its  utilization  is  restricted  to  those 
few  situations  in  which  one  has  access  to  the  entire,  time  history  of  the 
autocorrelation  sequence.  In  most  practical  applications,  one  has  little 
if  any  such  a  priori  knowledge.  More  typically,  there  is  available  only  a 
set  of  N  contiguous  observations 

x(l)  ,  x(2),  ...»  x(N)  (3) 


of  the  infinite  length  process  upon  which  to  base  a  spectral  estimation. 
This  inability  to  observe  a  random  process  over  all  time  reflects  real 
world  constraints  which  prevail  in  most  any  application.  The  problem  of 
concern  here  is  then  that  of  using  this  "incomplete"  set  of  observations 
to  estimate  the  underlying  power  spectral  density.  In  essence  we  seek  to 
use  the  finite  data  (3)  to  construct  a  function  Sx(z)  which  best  approxi¬ 
mates  S  (z)  is  some  fashion, 
x 


The  classical  spectral  estimation  approach  is  to  use  the  N  process 
observations  to  compute  autocorrelation  estimates  (i.e.,  rx(n)  for  n=0, 

±,  ...  ±N-1) ,  and,  then  to  take  a  discrete  Fourier  transform  of  a  weighted 
version  of  these  autocorrelation  estimates  (e.g.,  see  ref.  [1]).  This 
procedure  often  leads  to  unsatisfactory  results,  however,  since  the  generally 
erroneous  assumption  therein  being  made  is  that  the  autocorrelation 


68 


sequence  is  identically  zero  outside  the  data  limited  range  |n|  <  N-l. 

This  shortcoming  has  been  recognized  by  investigators  and  a  number  of  al¬ 
ternative  methods  that  do  not  impose  this  restrictive  assumption  have 
since  been  developed.  By  in  large,  these  methods  seek  to  approximate  the 
spectral  density  by  a  rational  function,  A  rational  model  can  be  justi¬ 
fied  on  the  basis  that  any  continuous  power  spectral  density  can  be  approxi¬ 
mated  arbitrarily  closely  by  a  rational  function  of  sufficiently  high 
order  [2], 

Undoubtably,  the  most  widely  used  of  these  models  is  the  "all-pole" 
rational  spectral  estimator  which  has  given  rise  to  the  basically  equiva¬ 
lent  maximum  entropy,  linear  predictive  coding,  and,  autoregressive  methods. 
In  each  case,  one  seeks  to  determine  the  coefficients  of  an  all-pole  model 
so  as  to  optimize  a  given  criteria.  In  the  maximum  entropy  method,  one 
selects  the  "optimum"  model  to  be  consistent  with  the  given  data  observa¬ 
tions  while  being  simultaneously  least  committal  about  the  remaining  unob¬ 
served  portion  of  the  random  process  [3].  On  the  other  hand,  the  linear 
predictive  coding  and  autoregressive  methods  seek  a  data  dependent  whitening 
filter  in  the  guise  of  a  one-step  predictor  [4].  In  each  of  these  three 
methods,  the  coefficients  characterizing  the  optimum  all-pole  model  are 
obtained  by  solving  a  system  of  linear  equations.  This  ease  of  model  genera¬ 
tion,  and,  the  fact  that  all-pole  models  can  often  yield  excellent  spectral 
resolution  performance  for  short  data  lengths  are  the  primary  reasons  for 
the  wide  acceptance  of  all-pole  spectral  estimators.  It  must  be  mentioned, 
however,  that  these  all-pole  methods  also  have  serious  shortcomings.  For 
example,  if  the  underlying  power  spectral  density  is  rational  and  contains 
zeroes  as  well  as  poles,  an  all-pole  model  can  result  in  very  poor  estimates. 

Conceptually,  a  better  behaved  spectral  estimator  would  result  if 
the  rational  spectrum  model  being  used  had  zeroes  as  well  as  poles.  In 
recognition  of  this  fact,  a  variety  of  such  models  have  been  generated  which 
typically  use  a  whitening  filter  approach  (e.g.,  see  refs.  [5]  and  [6]). 

These  procedures  have  produced  impressive  performance  when  the  number  of 
data  observations,  N,  adequately  exceeds  the  random  processes'  time  constant. 
When  this  is  not  the  case,  however,  their  spectral  estimation  performance 
falls  off  significantly.  In  order  to  retain  the  inherent  advantages  of 
using  a  zero-pole  model  while  not  requiring  an  excessively  large  number  of 
data  observations,  a  procedure  which  makes  explicit  use  of  the  autocorrela¬ 
tion  sequence  will  be  now  developed. 


II.  RATIONAL  SPECTRUM  MODEL 

In  this  section,  the  principal  implication  of  assuming  a  rational 
power  spectral  density  model  is  investigated.  The  random  process  (x(n) } 
is  said  to  have  a  rational  spectrum  if  its  power  spectral  density  can  be 
expressed  in  the  form 


-1. 


s  <*)  -  o2 

X  A(z)  A(z  ) 


(4) 


69 


where  a  is  a  positive  real  scalar  and  the  spectrum's  characteristic  ra¬ 
tional  function 


B  (z) 
A(z) 


1  +  b^z  ^  4-  b2Z  ^  + 

__  __  _2 
1  +  a^z  +  a^z  + 


+  b  z 
_ i 


-q 


+  a  z 
P 


-P 


(5) 


is  composed  of  polynomials  A(z)  and  B(z)  which  have  real  coefficients,  and, 
the  zeroes  of  these  polynomials  all  lie  within  the  unit  circle  of  the  z- 
plane.  This  rational  power  spectral  density  is  said  to  have  order  (p,q)  and 
its  zeroes  and  poles  are  seen  to  occur  in  sets  of  complex  conjugate  recipro¬ 
cals. 


The  fact  that  the  denominator  polynomial  A(z)  has  all  it's  zeroes 
located  inside  the  unit  circle  enables  us  to  provide  a  convenient  system 
interpretation  of  this  rational  discrete-time  process.  In  particular,  let 
us  consider  the  stable  recursive  linear  system  whose  transfer  function  is 
specified  by  the  characteristic  rational  function  (5).  This  linear  system  is 
then  governed  by  the  pth  order  linear  difference  equation 


x(n)  =  e(n)  +  b^eCn-l)  +  b2e(n-2)  +  •••  +  b^e(n-q) 
-a^xCn-l)  -  a2x(n-2)  -  ...  -  apx(n-p) 


(6) 


in  which  e(n)  and  x(n)  denote  the  excitation  and  response  signals,  respec¬ 
tively.  It  can  be  readily  shown  that  if  this  system  is  excited  by  a  sta¬ 
tionary  white  noise  process  as  statistically  characterized  by 

E{e(n)}  =  0  and  r^(n)  =  °^6(n)  (7) 


then  the  power  spectral  density  of  the  response  random  process  {x(n) }  is 
given  precisely  by  expression  (4).  Thus,  a  stationary  random  process  with 
a  rational  power  spectral  density  can  always  be  interpreted  as  being  the 
response  of  a  linear  system  to  a  white  noise  excitation.  This  linear  sys¬ 
tem  is  then  said  to  have  colored  the  excitation  process  and  for  this  reason 
we  commonly  refer  to  the  system  as  a  coloring  filter  as  suggestively  de¬ 
picted  in  Figure  1. 


e(n) 

White 

Noise 


B(z) 

A(z) 

Coloring 

Filter 


x(n) 

Colored 

Noise 


Figure  1.  Generation  of  a  Rational  Spectrum 


70 


In  the  spectrum  estimation  literature,  the  general  linear  recur¬ 
sive  system  (6)  is  commonly  referred  to  as  an  autoregressive-moving 
average  (ARMA)  model.  An  ARMA  model  is  seen  to  give  rise  to  a  rational 
spectrum  (4)  which  contains  zeroes  (via  B(z)  B(z-^))  as  well  as  poles 
(via  A(z)  A(z"^))  and  is  often  referred  to  as  a  zero-pole  model.  The  ARMA 
model  is  the  most  general  rational  spectrum  model  possible.  When  the 
numerator  polynomial  is  constrained  to  be  one  (i.e.,  B(z)  =  1),  the  impor¬ 
tant  subclass  of  autoregressive  (AR)  models  is  obtained.  This  all-pole 
model  is  the  one  most  often  used  in  spectral  estimation  primarily  due  to 
the  ease  with  which  one  can  determine  the  optimum  A(z)  polynomial  which 
correspond  to  a  given  finite  set  of  observations  (3).  It  is  to  be  noted 
that  an  AR  model  arises  whenever  one  uses  the  basically  equivalent  maximum 
entropy,  linear  predictive  coding,  or,  autoregressive  methods  of  spectral 
estimation.  Another  subclass  of  rational  spectrum  models  is  obtained  by 
constraining  the  denominator  polynomial  to  be  one  (i.e.,  A(z)  =  1).  This 
all-zero  model  is  commonly  referred  to  as  the  moving  average  (MA)  model. 
The  rational  spectrum  associated  with  each  of  these  models  is  shown  in 
Table  1. 


Process 

Spectrum 

ARMA 

o2|B(eJ“)/A(e>)|2 

AR 

a2/ | A(e^ w) | 2 

MA 

02|B<eJ“)|2 

Table  1.  Rational  Power  Spectral  Density  Classes 


An  examination  of  Table  1  reveals  the  more  robust  behavior  of  the 
ARMA  model  in  providing  rational  spectral  estimates.  This  robustness  was 
recently  demonstrated  empirically  in  reference  [5]  where  the  ARMA  model 
was  found  to  yield  overall  superior  spectral  estimates  for  a  variety  of 
problems.  Unless  one  has  a  prior  knowledge  which  would  indicate  other¬ 
wise,  it  then  seems  clear  that  the  ARMA  model  provides  the  obvious  choice 
when  seeking  a  rational  spectral  estimate.  Unfortunately,  the  practical 
problem  of  determining  the  optimum  A(z)  and  B(z)  polynomials  which  con¬ 
stitute  the  ARMA  model  is  an  analytically  intractable  one  and  necessitates 
an  algorithmic  solution.  Moreover,  unless  the  observed  data  length  N  is 
sufficiently  large,  the  standard  whitening  filter  approach  can  yield  poor 
spectral  estimates.  A  procedure  for  resolving  this  shortcoming  will  now 
be  presented. 


71 


III.  AUTOCORRELATION  APPROXIMATION  MODEL 


In  order  to  remove  the  apparent  ineompatability  of  determining 
general  ARMA  spectral  estimates  from  short  data  length  observations,  it 
will  be  beneficial  to  examine  the  autocorrelation  sequence.  In  particular, 
our  interest  will  be  directed  towards  the  causal  segment  of  the  auto¬ 
correlation  sequence  as  defined  by 


r+(„) 


0 


n  >  0 
n  <  0 


(8) 


Using  the  fact  that  the  autocorrelation  sequence  is  an  even  function  of  n, 
the  following  expression  is  readily  established 


rx(n> 


r+(n)  +  r*(-n)  _  r  (0)  <$(n> 


(9) 


where  6(n)  denotes  the  Kronecker  delta  sequence.  Upon  taking  the  z-transform 
of  this  relationship,  we  obtain  the  associated  power  spectral  density  func¬ 
tion 


S  (z)  =  S*(z)  +  S+(z_1)  -  r  (0) 

A  X  X  A 


(10) 


in  which  (z)  denotes  the  z-transform  of  the  causal  sequence  r  (n).  Clearly, 
there  exists  a  one-to-one  correspondence  between  the  two  z- transforms  S  (z) 
and  Sj(z). 


When  the  power  spectral  density  is  of  the  rational  form  as  given  in 
expression  (4),  a  little  thought  will  convince  oneself  that  S^(z)  must  be 
of  the  specific  form 


s>> 


,  -1  ,  ,  -p 

c  4-  c,  z  +  *  •  *  +  c  z 
o  1 _ p 

-1  -p 

1  +  a,z  +  •  •  •  +  a  z 
1  P 


(11) 


in  which  this  representation's  denominator  polynomial  corresponds  to  the 
A(z)  polynomial  of  expression  (5) .  Upon  multiplying  both  sides  of  this 
representation  by  A(z)  and  then  taking  the  inverse  z-transform,  the  funda¬ 
mental  recursive  relationship 


r  (n)  =  c  <5(n)  +  c,6(n-l)  4-  •••  +  c  <S(n-p) 
x  o  l  p 


'a, r^(n-l)  -  a„r+(n-2)  -  -  a  r+(n-p) 

lx  L  X  p  X 


(12) 


72 


arises  in  which  the  boundary  conditions  r  (n)  =  0  for  n  <  0  are  imposed 
to  reflect  the  causal  nature  of  the  rx(n)  sequence.  Thus,  if  a  random 
process  has  a  rational  spectrum  of  order  p,  the  elements  of  the  associated 
autocorrelation  sequence  will  satisfy  this  recursive  relationship  for 
appropriate  choices  of  the  a^,  c^  coefficients. 

Upon  examination  of  relationship  (12),  it  is  apparent  that  knowl¬ 
edge  of  the  first  2p  +  1  values  of  the  autocorrelation  sequence  will 
enable  us  to  uniquely  identify  its  characterizing  a^,  c^  coefficients. 

This  information  can  then,  in  turn, be  used  to  construct  the  underlying 
power  spectral  density  via  expression  (10).  We  shall  now  use  a  version  of 
this  approach  to  effect  a  data  efficient  means  of  spectral  estimation. 

This  version  must  take  into  account  the  fact  that  the  autocorrelation 
element  values  are  not  known  a  priori.  As  such,  the  first  step  necessi¬ 
tates  the  generation  of  autocorrelation  estimates  based  on  the  N  data  ob¬ 
servations  provided.  A  standard  estimation  rule  for  achieving  this  ob¬ 
jective  is  given  by 

>,  i  N-n 

r  (n>  =  E  x(k)  x(n+k)  for  n=0,  1,  •••,  N-l  (13) 

x  N  n  k=l 

in  which  it  is  tacitly  assumed  that  2p  +  1  <  N.  One  can  readily  verify 
that  this  autocorrelation  estimate  is  unbiased  and  that  its  variance 
generally  increases  for  increasing  values  of  the  lag  index  n  (e.g.,  see 
ref.  [7]).  This  statistical  behavior  reflects  a  growing  lack  of  confidence 
in  the  autocorrelation  estimate  as  n  increases.  This  confidence  factor  will 
be  taken  into  account  in  what  is  to  follow. 

We  next  seek  to  determine  values  for  the  a^,  c^  coefficients  govern¬ 
ing  model  (12)  which  will  be  most  consistent  with  these  generated  auto¬ 
correlation  sequence  estimates.  A  generally  accepted  measure  of  consis¬ 
tency  is  provided  by  the  mean  squared  error  criterion  as  given  by 

N_1  *  +  2 

x(ak>  ck)  ~  z  w(a)[rx(n)  -  ^(n)]  (14) 

n=0 

where  w(n)  is  a  nonnegative  weighting  sequence  used  to  reflect  our  de¬ 
creasing  confidence  in  r^Cn)  as  n  increases.  Our  objective  is  to  then 
select  the  a^,  c^  coefficients  so  that  the  sequence  rx(n)  as  generated 
by  model  (12)  best  approximates  r^(n)  in  the  sense  of  minimizing  criterion 
(14).  This  is  an  analytically  intractable  problem  and  its  eventual 
solution  necessitates  an  algorithmic  approach.  The  ultimate  success  of  the 
spectral  estimation  procedure  here  described  depends  critically  upon  the 
algorithm  used.  The  linearization  algorithm, as  described  in  references  [8]  and 
[9],  has  proven  to  be  a  significantly  more  effective  tool  than  the  standard 
gradient  method  (10].  As  with  all  algorithms,  the  intial  coefficient 
selection  plays  an  important  role  in  regards  to  how  quickly  the  lineari¬ 
zation  algorithm  converges  and  to  which  relative  minimum  it  converges. 


A  particularly  effective  initial  coefficient  selection  method  is  to  be 
found  in  ref.  [11]. 

A  summary  of  the  proposed  spectral  estimation  method  is  outlined 
in  Table  2.  It  is  important  to  note  that  the  particular  autocorrelation 
estimate,  and,  optimum  model  algorithm  to  be  used  have  not  been  specified. 
Suggested  procedures  have  been  offered  in  this  section,  but  the  most 
effective  selections  remains  a  subject  of  future  research. 

/ 


Step  1:  Generate  an  autocorrelation  estimate  rx(n) 

from  the  N  data  observations. 


Step  2:  Determine  the  causal  autocorrelation  model 

(12)  which  is  most  consistent  with  the  es¬ 
timated  autocorrelation  sequence  obtained 
in  Step  1.  One  may  employ  an  other  than 
mean  squared  error  criterion  for  measuring 
this  consistency. 


Step  3:  Construct  the  spectral  estimate  using  the 

relationship 

Sx(ej“)  -  S+(eJu)  +  Sx(e‘ja))  -  ^(0) 

-  2  Re[S+(eju))]  -  r  (0)  (15) 

A 

-f. 

in  which  the  most  consistent  model  S  (z) 

x 

found  in  Step  2  is  used. 


Table  2.  Basic  Steps  of  the  Proposed 
Spectral  Estimation  Method. 


IV.  NUMERICAL  EXAMPLES 

A  spectral  estimation  problem  which  arises  in  a  surprisingly  large 
number  of  applications  is  that  of  the  detection  and  parameter  identification 
of  sampled  sinusoids  from  noise  contaminated  measurements.  This  particular 
class  of  problems  serves  as  an  effective  means  for  measuring  the  performance 
of  spectrum  estimators  relative  to  (i)  detecting  the  presence  of  sinusoids 
when  the  additive  noise  is  strong,  and  (ii)  resolving  two  or  more  sinusoids 
whose  frequencies  are  closely  spaced.  In  this  section,  we  shall  apply  the 
autocorrelation  estimation  method  (AEM)  to  estimate  the  spectrum  of  the 
fourth  order  ARMA  generated  data  as  governed  by 


74 


x(n)  “  A  sin[0.4irn]  +  A  sin[0.426irn]  4-  w(n)  0  <  n  <  63 


(16) 


in  which  {w(n)}  is  a  zero-mean  white  noise  Gaussian  process  with  variance 
one.  It  is  to  be  noted  that  the  frequency  spacing  of  the  constituent 
atuuuoida  (i.u.,  0.026ir)  ia  leas  than  the  resolution  capability  of  the 
standard  discrete  Fourier  transform  (i.e.,  2it/64  ■  0.0312511).  This  parti¬ 
cular  problem  lias  been  considered  in  detail  in  reference  [12]  where  the 
performance  of  some  of  the  more  commonly  used  spectral  estimators  were 
compared.  The  individual  sinusoidal  signal- to-noise  ratio  for  the  above 
signal,  as  expressed  in  decibels  is  given  by  10  log(A^/2) .  In  order  to  de¬ 
termine  the  effectiveness  of  the  ARM  in  different  noise  environments  we 
shall  consider  two  sinusoidal  amplitude  parameter  selections. 


CASE  1  :  A  -  /20Q0 


When  the  sinusoid's  amplitude  is  set  at  A  *  /2000,  the  prevailing 
SNK  is  30  db .  In  this  strong  signal  ease,  we  shall  be  testing  the  spectral 
estimator's  ability  to  resolve  closely  spaced  (in  frequency)  sinusoids,  and 
to  accurately  estimate  their  frequencies.  Upon  generation  of  the  postu¬ 
lated  sequence  (16),  the  autocorrelation  estimate  was  generated  according  to 
the  unbiused  rule  (13).  The  linearization  algorithm  was  next  employed  to  ob¬ 
tain  the  besL  fourth  order  (i.e.,  p  •  4)  ARMA  autocorrelation  model  in 
which  the  weighting,  sequence  w(n)  "  (N  -  n)^  was  selected  so  as  to  reflect 
our  decreasing  confidence  in  the  autocorrelation  estimates  for  increasing 
•aiiues  of  n.  This  ARMA  model  was  then  used  to  generate  the  required  spectral 
ustimutu  according  to 


S  (e^)  -  2Re[S+(ejW)]  -  r  (Q) 

X  X  X 


(17) 


A  plot  of  this  ARM  estimate  over  the  normalized  frequency  Interval  is 
shown  in  Figure  2a  where  the  frequency  resolution  capabilities  are  clearly 
evident.  The  estimated  center  frequencies  were  found  to  correspond  almost 
exactly  to  the  sinusoids  used  in  generating  the  data.  In  this  and  the  plots 
to  follow,  the  spectral  peak  has  been  normalized  to  30  db,  and  no  special 
routine  bus  been  employed  to  determine  the  amplitudes  of  the  constituent 
sinusoids  from  this  spectral  estimate  . 

For  comparison  purposes,  a  eovurianee  AR  spectral  estimate  of 
older  fifteen  was  next  generated  using  data  (16).  As  demonstrated  in 
reference  [12],  this  particular  AR  spectral  estimator  works  particularly 
well  for  this  class  of  problems.  The  results  of  this  covariance  AR 
spectral  estimate  are  shown  in  Figure  2b.  An  might  be  anticipated,  the 
covariance  AR  method  also  yioldu  excellent  resolution  capabilities  in  this 
high  ENR  environment.  The  estimated  center  frequencies  obtained  were  also 
of  good  quulily. 


75 


\ 


jET55?X^553CI7 ***« , ,  ■  73* 


3 


§  o 

M  O 


0.00 

(b) 

Figure  2. 


Normalized  Frequency 


0.20 


0.  40 


0.60 


0.80 


Normalized  Frequency 


Plot  of  Spectral  Estimate  in  Strong  Signal  Case  SNR 

(a)  Fourth  Order  ARM  Spectral  Estimate. 

(b)  Fifteenth  Order  AR  Spectral  Estimate. 


1.00 


30  db. 


i.t  U' nk; 


CASE  II  :  A  =  /2 


For  this  sinusoidal  amplitude  selection,  the  prevailing  SNR  is 
zero  d.b.  Using  the  same  procedure  as  in  Case  I,  a  fourth  order  AEM 
spectral  estimate  for  this  low  SNR  case  was  found  and  is  displayed  in 
Figure  3a.  Significantly,  we  are  still  able  to  detect  the  presence  of 
two  sinusoids  and  obtain  reasonable  estimates  of  the  sinusoidal  frequen¬ 
cies  (i.e.,  0.392  and  0.430).  On  the  other  hand,  when  a  15^  order  AR 
spectral  estimate  is  generated  from  this  data,  the  resultant  spectrum 
plotted  in  Figure  3b  demonstrates  a  failure  to  detect  the  two  sinusoids. 
This  inability  can  be  attributed  to  the  fact  that  the  actual  spectrum  con¬ 
tains  zeroes  (due  to  the  strong  white  noise)  close  to  the  unit  circle 
which  results  in  an  AR  spectral  model  mismatch.  Thus,  in  this  hostile 
noise  environment,  the  AEM  spectral  estimator  produces  clearly  superior 
results. 


V.  CONCLUSION 

The  autocorrelation  estimation  method  (AEM)  for  generating  on  ARMA 
rational  spectral  estimate  has  been  presented.  This  procedure  offers  the 
promise  of  achieving  effective  spectral  estimation  performance  without 
requiring  an  excessively  large  number  of  data  samples  to  do  so.  In  order 
to  reach  its  full  potential,  however,  a  number  of  fundamental  issues  have 
to  be  resolved.  Perhaps  the  most  important  of  these  involve  the  specific 
procedure  used  in  determining  the  autocorrelation  estimates,  and  the  selec¬ 
tion  of  error  weights  used  in  the  squared  error  criterion.  The  former  is 
most  critical  since  if  "poor"  autocorrelation  estimates  are  used  in  generat¬ 
ing  the  optimum  ARMA  model,  one  cannot  possibly  hope  to  achieve  an  accurate 
spectral  estimate.  On  the  other  hand,  even  with  acceptably  good  auto¬ 
correlation  estimates,  a  proper  weighting  of  model  error  is  required  in 
order  to  reflect  the  growing  lack  of  confidence  in  the  autocorrelation  esti¬ 
mates  for  increasing  values  of  n.  It  is  felt  that  the  weights  to  be  used 
should  be  data  dependent.  Some  other  issues  which  must  also  be  resolved  are 
(i)  determining  the  order  of  the  ARMA  model,  (ii)  investigation  of  cri¬ 
teria  other  than  squared  error,  (iii)  developing  fundamentally  different 
procedures  which  make  use  of  the  causal  autocorrelation  concept  herein  pre¬ 
sented  for  obtaining  spectral  estimates. 


ACKNOWLEDGMENT 

I  would  like  to  take  this  opportunity  to  acknowledge  the  contribu¬ 
tions  of  Koji  Ogino  and  Bfhshad  Beseghi. 


77 


Spectrum  -  qd 


(a) 


Normalized  Frequency 


o 

o 


Figure  1,  Flol  of  Spectral  Estimate  in  Weak  Signal  Case  SNR  -  0  db 

(a)  Fourth  Order  ARM  Spectral  Estimate. 

(b)  Fifteenth  Order  AR  Spectral  Estimate, 


78 


lixTts ' 


■M*M***  . . 


VI.  REFERENCES 


[1]  R.  B.  Blackman,  and  J.  W.  Tukey,  THE  MEASUREMENT  OF  POWER  SPECTRA 
FROM  THE  POINT  OF  VIEW  OF  COMMUNICATIONS  ENGINEERING,  New  York, 

Dover,  1959. 

[2]  E.  Cheney,  INTRODUCTION  TO  APPROXIMATION  THEORY,  McGraw-Hill,  New  York 
1966. 

[3]  J.  P.  Burg,  "Maximum  Entropy  Spectral  Analysis,"  Proceedings  of  the 
37c^  meeting  of  the  Society  of  Exploration  Geophysicists,  1967. 

[4]  J.  Makhoul,  "Linear  Prediction,  A  Tutorial  Review,"  Proc.  IEEE, 

Vol .  63,  pp.  561-580,  April  1975. 

[5]  P.  R.  Gutowski,  E.  A.  Robinson,  and  S.  Treitel,  "Spectral  Estimation: 
Fact  or  Fiction,"  IEEE  Trans.  Geoscience  Electronics,  Vol.  GE-16, 

No.  2,  pp.  £<0—84 ,  April  1978. 

[6]  S.  A.  Tretter  and  K.  Steiglitz,  "Power-Spectrum  Identification  in 
Terms  of  Rational  Models,"  IEEE  Trans.  Automatic  Control,  Vol.  AC-12, 
pp.  185-188,  April  1967. 

[7]  S.  A.  Tretter,  DISCRETE-TIME  SIGNAL  PROCESSING,  John  Wiley  and  Sons, 
New  York,  1976. 

[8]  J.  A.  Cadzow,  "Recursive  Digital  Filter  Synthesis  via  Gradient  Based 
Algorithms,"  IEEE  Trans.  Acoust.,  Speech,  and  Signal  Processing, 

Vol.  ASSP-24,  No.  5,  pp .  349-355,  October  1976. 

[9]  J.  A.  Cadzow,  "Linear  Recursive  Modeling,"  Presented  at  the  1975 
Pittsburgh  Conf .  Modeling  and  Simulation,  University  of  Pittsburgh, 
April  1975. 

[10]  G.  W.  Bordner,  "Time  Domain  Design  of  Stable  Recursive  Digital  Filters 
Ph.D.  Thesis,  State  University  of  N.Y.  at  Buffalo,  1974. 

[11]  J.  A.  Cadzow,  "ARMA  Spectral  Estimation:  An  Efficient  Closed-Form 
Procedure,"  1979  RADC  Spectral  Estimation  Workshop,  October  1979. 

[12]  T.  M.  Sullivan,  0.  L.  Frost,  J.  R.  Treichler,  "High  Resolution  Signal 
Estimation,"  ARGO  Systems,  Inc.  Tech.  Rept.,  June  1978. 


79 


ARMA  SPECTRAL  ESTIMATION : 


AN  EFFICIENT  CLOSED-FORM  PROCEDURE 


James  A.  Cadzow 

Department  of  Electrical  Engineering 
Virginia  Polytechnic  Institute  and  State  University 
Blacksburg,  Virginia  2*4061 
(703)  961-569H 


ABSTRACT 


A  closed-form  procedure  for  generating  an  ARMA  spectral  estimate  of  a 
stationary  random  time  series,  based  upon  a  finite  set  of  contiguous  observa¬ 
tions*  is  presented.  As  in  the  maximum  entropy  method,  this  procedure  in 
effect  extrapolates  an  autocorrelation  estimate  beyond  the  data  limited  range 
thereby  offering  the  possibility  for  improved  spectral  resolution  in  compari¬ 
son  to  the  more  classical  Fourier  based  approaches.  Unlike  the  maximum 
entropy  method,  however,  this  procedure  has  the  additional  flexibility  of 
generating  a  spectral  model  which  possesses  zeroes  as  well  as  poles.  As  such, 
it  has  a.  more  robust  behavior  and  therefore  the  capability  of  producing  supe¬ 
rior  spectral  estimation  performance.  This  latter  claim  has  been  empirically 
confirmed  for  a  number  of  examples  in  which  these  two  methods  have  been 
applied.  Significantly,  the  computational  requirements  of  the  two  procedures 
are  comparable.  This  suggests  that  the  herein  developed  ARMA  spectral  esti¬ 
mator  can  be  used  as  a  primary  tool  in  spectral  estimation. 

I .  INTRODUCTION 


A  signal  processing  problem  which  arises  in  a  variety  of  interdisciplinary 
applications  is  that  of  estimating  the  spectrum  of  a  stationary  random  time 
series.  This  estimation  is  to  be  based  wholly  on  a  set  of  N . contiguous  obser¬ 
vations  of  that  time  series  as  represented  by 

x(l) ,  x(2) ,  .  .  . ,  x( N )  (l) 

The  inability  to  monitor  the  entire  history  of  the  infinite  length  time  series 
reflects  constraints  which  usually  prevail  in  virtually  all  real  world  appli¬ 
cations.  Unless  some  assumptions  are  made  relative  to  the  statistical  struc¬ 
ture  of  the  underlying  time  series,  the  generation  of  a  spectral  estimate  from 


Research  sponsored  by  the  Air  Force  Office  of  Scientific  Research/AFGC, 
United  States  Air  Force  under  Contract  F49620-79-C-003o,  The  United 
States  Government  is  authorized  to  reproduce  and  distribute  reprints 
for  governmental  purposes  notwithstanding  any  copyright  notation  herein. 


81 


VI  Iffy.-' 


this  finite  data  is  a  poorly  posed  problem.  This  is  a  direct  consequence  of 
the  fact  that  the  spectral  content  of  a  time  series  is  completely  specified  by 
its  associated  autocorrelation  sequence 

r  (n)  =  E(x(k)x(k+n))  n  =  0,  +1,  +2,  .  .  .  (2) 

yi  ~  ~~ 

in  which  E  denotes  the  expected  value  operator.  Clearly,  there  exists  a  basic 
information  content  incompatibility  between  the  infinite  extent  autocorrela¬ 
tion  sequence  and  the  finite  set  of  time  series  observations  upon  which  the 
sp  ctral  estimate  is  to  be  based.  This  incompatability  is  usually  resolved 
t:  ough  the  process  of  parameterizing  the  underlying  spectrum  in  some  logical 
manner . 


The  power  spectral  density  corresponding  to  the  stationary  time  series 
{x(n )}  is  defined  to  be  the  z-transform  of  the  associated  autocorrelation 
sequence  (2),  that  is 


SX.U) 


n=-» 


r  (n) 
x 


-n 


(3) 


where  z  is  a  complexed  valued  variable..  In  the  spectral  estimation  literature, 
the  z  variable  is  usually  replaced  by  thereby  yielding  the  equivalent 
Fourier  transform  characterization  as  designated  by  S  (eJaJ).  With  this  latter 
representation,  one  can  interpret  the  spectrum  as  being  a  function  of  the  real 
frequency  variable  u) . 


In  classical  spectral  estimation,  one  utilizes  a  standard  Fourier  trans¬ 
form  based  method,  such  as  the  periodogram ,  to  effect  the  spectral  estimate. 
The  primary  drawback  in  these  Fourier  based  methods  resides  in  the  inherent 
assumption  being  there  made  that  the  time  series  is  identically  zero  (or 
periodic)  outside  the  observation  window  1  <_  n  <_  N.  Generally,  this  is  a  very 
unrealistic  assumption  to  make  in  virtually  all  practical  applications  (e.g. , 
radar  doppler  processing)  and  will  usually  result  in. poor  spectral  estimation 
performance.  In  recognition  of  this  fact,  a  number  of  modern  spectral  esti¬ 
mation  procedures  have  been  developed  over  the  past  decade  to  counteract  this 
deficiency.  By  in  large,  the  typical  modern  spectra]  estimation  procedure 
models  the  spectrum  as  a  rational  function.  Such  a  inode],  can  be  justified  on 
the  basis  that  any  continuous  power  spectral  density  can  be  approximated 
arbitrarily  closely  by  a  rational  function  of  sufficiently  high  order  [ 1 ] . 

The  most  widely  used  rational  spectral  model  is  the  so-called  all-pole 
model  which  has  given  rise  to  the  essentially  equivalent  autoregressive, 
linear  predictive  coding,  and  maximum  entropy  methods  of  spectral  estimation. 

A  set  of  basic  papers  treating  these  and  other  spectral  estimation  procedures 
is  to  be  found  in  ref.  |??J.  All-polo  spectral  estimators  are  capable  of  pro¬ 
viding  increased  resolution  in  comparison  to  the  classical  methods  when  only 
a  small  number  of  time  series  observations  are  available,  l't  must  be  noted, 
however,  that  if  the  time  series  spec l rum  is  a  rational  function  which 
possesses  zeroes  as  well  as  poles,  then  an  all-pole  estimator  can  yield  poor 
spectral  estimates.  Clearly,  the  ability  to  generate  a  zero- pole  spectrum 
mode.1.  provider  for  a  potentially  more  robust  estimator  in  comparison  with  the 
standard  all-pole  model.  With  this  in  mind,  a  number  of  zero-pole  spectral 


82 


I 

I 

t 


estimation  methods  have  been  developed.  They  include  estimators  which  utilize 
the  so-called  whitening  filter  concept  (e.g.,  [3]  &  [4]).  Unfortunately,  this 
class  of  spectral  estimator  procedures  are  iterative  in  nature,  and,  typically 
require  a  relatively  large  number  of  time  series  observations  to  be  effective. 
Another  approach  which  makes  use  of  the  recursive  nature  of  the  time  series 
autocorrelation  sequence  and  does  not  share  these  liabilities  was  developed  by 
Box  and  Jenkins  [5].  A  modification  of  this  method  involving  a  more  efficient 
noniterative  method  for  generating  the  moving  average  coefficients  was  recently 
proposed  [6]  and  [7]. 

In  this  paper,  a  zero-pole  spectral  estimator  is  developed  which  also  makes 
use  of  the  recursive  nature  of  the  autocorrelation  sequence.  It  distinguishes 
itself  from  the  Box  and  Jenkins  method,  however,  in  that  a  least  squares  fit 
to  a  set  of  equation  errors  is  used  to  generate  the  autoregressive  coefficients 
and,  a  noniterative  procedure  for  generating  the  moving  average  coefficients 
is  offered.  Significantly,  the  proposed  spectral  estimator  has  been  empiri¬ 
cally  found  to  produce  superior  estimation  performance  when  compared  with  the 
Box  and  Jenkins  method  and  its  varients. 

II.  RATIONAL  SPECTRUM  MODEL 

One  of  the  most  widely  used  models  for  spectral  estimation  is  the  rational 
model.  The  stochastic  time  series  {x(n)}  is  said  to  have  a  rational  power 
spectrum  if  its  power  spectral  density  can  be  expressed  in  the  form 

S  (z)  =  Il(z)  H(z_1)  o2  (1+) 

X 

2 

where  0  is  a  positive  constant  and  the  characteristic  rational  function 


H(  z) 


B(z) 
A(  z) 


1  +  b  ^  z  ^  +  . 
1  +  a^z  '  +  . 


+  b  sf* 
<1 

~p 

+  a  z 
P 


(5) 


is  composed  of  polynomials  A(z)  and  B(z)  which  have  real  coefficients  and  have 
zeroes  wholly  contained  within  the  unit  circle.  The  rational  power  spectral 
density  (t)  is  said  to  have  order  (p,q)  and  its  zeroes  and  poles  are  seen  to 
occur  in  sets  of  complex  conjugate  reciprocals.  For  reasons  which  will  be 
shortly  made  clear,  we  shall  refer  to  the  and  b  coefficients  as  the  auto¬ 
regressive  and  moving  average  coefficients,  respectively. 


A  particularly  convenient  interpretation  on  how  a  stochastic  time  series 
with  rational  spectrum  may  arise  follows  directly  from  the  characteristic 
rational  function.  This  entails  treating  the  characteristic  rational  function 
(*j)  as  being  the  transfer  function  of  a  causal,  time- invariant  linear  system. 
It  then  follows  that  this  system  will  be  characterized  by  the  recursive 
equation 

q  p 

x(n)  =  Y  1) .  e  (n-i)  -  Y  u.x(n-i)  (6) 

1=0  i-1 


83 


*, 1 J  .'f  T'feLltM  U  J  il  V:y  uiiifi  .V  \  if  ^  ajix!  i  filkikikMAMwtaiLi  '_L 


-  -  'At  MM 


where  bc=l  and  the  time  series  {e(n)}  and  (x(n)}  are  taken  to  be  the  excita¬ 
tion  and  response  signals,  respectively.  It  is  well  known  that  when  this 
system  is  excited  by  a  stationary  white  noise  time  series  as  statistically 
characterized  by 

E{e(n)}  =  0  and  r£(n)  =  a?S(n)  (?) 

that  the  power  spectral  density  of  the  response  time  seiies  is  given  precisely 
by  relationship  (A)  .  Thus,  a  stationary  random  time  seizes  with  rational 
power  spectral  density  can  be  interpreted  as  being  the  response  of  a  causal, 
time-invariant  linear  system  to  a  white  noise  excitation.  This  linear  system 
is  then  said  to  have  colored  the  white  noise  excitation  process  (i.e., 

Se(z)  =  l)  and  for  this  reason  it  is  commonly  referred  to  as  a  coloring  filter 
as  suggestively  depicted  in  Figure  1. 


c(n) 


White 

Noise 


Colored 

Noise 


Coloring 

Filter 

FIGURE  1:  Model  for  a  Rational  Spectrum  Generator 


The  general  linear  system  (6)  is  commonly  referred  to  as  an  autoregressive- 
moving  average  (ARMA)  model  in  the  spectral  estimation  literature.  This  ARMA 
model  is  said  to  be  of  order  (p,q)  and  it  gives  rise  to  the  rational  spectrum 
(A)  which  possesses  both  zeroes  (via  B(z))  as  well  as  poles  (via  A(z)).  The 
ARMA  model  is  the  most  general  of  rational  spectrum  models  possible  and  its 
a^  and  coefficients  uniquely  characterize  the  spectrum. 


In  the  spectral  estimation  literature,  the  preponderance  of  activity  has 
been  directed  towards  the  special  class  of  ARMA  models  known  as  autoregressive 
(AR)  models.  An  AR  model  is  one  in  which  the  numerator  polynumial  B(z)  is 
equal  to  the  constant  one  (i.e.,  bk  =  0  for  k  ^  0).  As  such,  the  AR  model  is 
also  referred  to  as  an  all-polo  model  since  its  transfer  function  is  specified 
by 


Il(  z) 


(0) 


1  The  Kroneckcr  deltu 


sequence  is  defined  by 

n  =  0 
otherwi su 


HA 


This  all-pole  model  is  the  one  most  often  used  in  spectral  estimation  primarily 
due  to  the  ease  with  which  one  can  compute  the  ak  coefficients  that  correspond 
to  a  given  finite  set  of  time  series  observations.  It  should  be  noted  that  it 
is  always  possible  to  approximate  a  general  ARMA  model  by  an  AR  model  in  the 
following  manner 

1  1 

HU)  =  — rr~r  - —  (?) 

A(‘>[sy  A(z)  Aiu> 

whereby  the  polynomial  Ai(z)  is  obtained  by  suitably  truncating  the  power 
series  1/B(z)  as  generated  by  long  division.  Clearly,  the  effectiveness  of 
this  approach  is  dependent  on  how  quickly  the  coefficients  of  the  long  divi¬ 
sion  1/B(z)  converge  to  zero.  If  B(z)  has  a  zero  very  close  to  the  unit 
circle,  this  convergence  rate  will  be  extremely  slow  thereby  making  impractical 
the  approximation  of  an  ARMA  model  by  a  reasonably  low  order  AR  model.  It  can 
be  conjectured  that  this  is  one  of  the  main  factors  as  to  why  AR  models  fail 
to  yield  satisfactory  spectral  estimates  of  time  series  composed  of  sinusoidal 
samples  in  a  strong  noisy  environment  (an  ARMA  process). 

Another  subclass  of  rational  spectrum  models  which  has  received  attention 
is  the  so-called  moving  average  (MA)  model  as  characterized  by  A(z)  =  1.  The 
transfer  function  of  a  MA  model  is  given  by  B(z)  and  it  is  therefore  also 
referred  to  as  an  all-zero  model.  With  these  thoughts  in  mind,  it  is  apparent 
that  a  general  ARMA  model  is  composed  of  the  cascading  of  an  AR  with  an  MA 
model.  The  rational  spectrum  associated  with  each  of  these  models  is  displayed 
in  Table  1. 


TABLE  1.  Rational  Spectrum  Models 


An  examination  of  Table  1  reveals  the  greater  flexibility  which  the  ARMA 
model  possesses  in  providing  rational  spectral  estimates.  This  robustness  was 
recently  demonstrated  in  which  the  ARMA  model  was  found  to  provide  the  overall 
best  spectral  estimates  for  a  variety  of  problems  [3].  Unless  one  has  a  priori 
knowledge  which  would  Indicate  otherwise,  it  seems  clear  that  the  ARMA  model 
is  the  one  to  utilize  when  seeking  a  rational  spectrum  model.  Hereafter,  we 
shall  concern  ourselves  with  the  practical  tack  of  developing  feasible  proce¬ 
dures  for  determining  the  "optimum"  coefficients  of  an  ARMA  model  based  on  a 
finite  set  of  time  series  measurements. 


83 


III.  FUNDAMENTAL  AUTOCORRELATION  RECURSIVE  RELATIONSHIP 


For  reasons  alluded  to  in  the  last  section,  there  exists  a  basic  incom¬ 
patibility  in  generating  an  ARMA  spectral  model,  which  is  most  consistent  with 
a  given  set  of  time  series  observations,  when  the  number  of  observations  is 
small.  This  incompatibility  can  be,  to  a  large  extent,  alleviated  by 
appealing  to  a  fundamental  recursive  relationship  characterizing  ARMA  time 
series.  This  relationship  is  obtained  by  analyzing  the  "causal  image"  of  the 
autocorrelation  sequence  as  defined  by 

r 


r/tn)  -\ 


rx(n) 

0 


n  >  0 
n  <  0 


(10) 


Since  the  autocorrelation  sequence  of  a  real  valued  time  series  is  an  even 
function  of  n,  it  is  apparent  that  one  can  reconstruct  the  autocorrelation 
sequence  from  its  causal  image  according  to 


r  (n)  =  r  +(n)  +  r  +(-n)  -  r  +(0)  5(n) 

X  X  X  X 


(11) 


Upon  taking  the  z-transform  of  this  expression,  the  desired  power  spectral 
density  is  found,  that  is 

S  (z)  =  S  +(z)  +  S  +(z'1)  -  r  (0)  (12) 

X  X  X  X 

where  the  function  S,  (z)  denotes  the  z-transform  of  the  causal  image 
sequence  (10).  Thus,  a  power  spectral  density  estimate  may  be  equivalently 
accomplished  by  estimating  the  function  Sx  (z).  This  will  be  the  approach 
taken  in  this  paper. 


When  the  underlying  power  spectral  density  is  of  the  rational  form  (8), 
a  little  thought  should  convince  oneself  that  the  function  Sx  (?,)  must  be  of 


the  specific  rational  form 


’(a)  = 


co  +  cqZ 


-1 


+  Cl 


-p 


1  +  a  Z“1 


+  a  z“P 
P 


(13) 


in  which  the  denominator  polynomial  is  identical  to  the  A(z)  polynomial  that 
in  part  characterizes  ^(z)-1-.  Upon  multiplying  both  sides  of  this  equation 
by  the  polynomial  A(z)  and  then  taking  the  inverse  z-transform,  one  readily 
arrives  at  the  following  fundamental  recursive  relationship 


+ 

r 

x 


(n) 


V  J.  + 

E  c  S(n-i)  +  E  a.r  (n-i) 
3-0  1  i==l  1  X 


(iM 


It  is  here  assumed  that  the  ARMA  model  of  order  (p,q)  is  such  that 
p  _>  q.  When  this  is  not  the  case,  the  degree  of  the  numerator  polynomial 
d(z)  must  be  Increased  to  q. 


80 


where  the  natural  boundary  condition^  r  (n)  =  0  for  n  <  0  are  imposed  to 
reflect  the  causality  of  sequence  r  (n)* .  Thus,  the  causal  image  of  an  ARMA 
autocorrelation  sequence  of  order  (p,q)  is  seen  to  be  governed  by  a  linear 
difference  equation  of  order  p. 


Upon  examining  fundamental  relationship  (1^),  it  is  apparent  that  a 
knowledge  of  the  a.,  c.  coefficients  will  enable  one  to  generate  the  entire 
autocorrelation  sequence.  If  it  were  somehow  possible  to  accurately  estimate 
these  coefficients  from  the  given  time  series  observations,  a  particularly 
effective  method  of  spectral  estimation  is  suggested.  Namely,  these  coeffi¬ 
cient  estimates,  when  substituted  into  equation  (13),  will  provide  an  estimate 
for  Sx  (z).  Using  this  estimate  in  relationship  (12),  the  desired  power 
spectral  density  estimate  is  then  obtained 


S  (eJa))  =  2Re 
x 


E  c  e~JkaV(l  +  I  ake"Jkti)] 
k=0  K  k=l 


(15) 


where  use  of  the  fact  that  S  (e  W  )  and  S  (e  )  are  complex  conjugates  has 
been  made.  We  shall  now  present  a  procedure  for  estimating  the  a.,  c.  coeffi¬ 
cients  with  the  ultimate  goal  of  using  relationship  ( 15)  for  the  spectral 
estimate. 


IV.  ARMA  MODEL  COEFFICIENT  SELECTION  PROCEDURES 

The  most  critical  step  of  the  proposed  spectral  estimation  method  involves 
estimating  the  a^  and  c^  coefficients.  In  this  section,  the  so-called  direct 
and  indirect  procedures  for  accomplishing  this  task  will  be  described.  The 
direct  approach  makes  explicit  use  of  the  fundamental  autocorrelation  relation¬ 
ship  derived  in  the  previous  section.  On  the  other  hand,  the  more  effective 
indirect  approach  uses  an  alternate  approach  which  provides  a  solution  proce¬ 
dure  that  is  consistent  with  the  fundamental  autocorrelation  relationship. 

Direct  Method 

In  the  direct  method,  one  first  generates  estimates  of  the  autocorrelation 
sequence  from  the  given  timeAseries  observations  using  some  convenient  method.  • 
These  estimates,  denoted  as  rx(n),  are  then  substituted  into  fundamental 
relationship  (l^).  In  recognition  that  the  autocorrelation  estimates  will  be 
generally  in  error,  and  that  the  ARMA  model  order  parameter  p  may  be  incorrect, 
it  follows  that  this  substitution  will  give  rise  to  the  following  "equation 
error"  sequence 

A  P  A  P 

e(n)  =  rx(n)  +  Z  ii±r Jn-l)  -  t  c  5(n-i)  0<n<N-l  (.16) 

i=l  i=0  - 

A 

in  which  rv(n)  =  0  for  n  <  o. 

■'■As  an  example,  one  might  use  the  biased  estimator. 

a  ,N-n 

rx(n)  nF  E  x(k)x( k+n) 

Vl 


B7 


Our  objective  will  that  of  selecting  the  models  a-^ ,  c^  coefficients  so  as  to 
minimize  these  equation  errors  in  some  sense.  For  reasons  of  mathematical 
tractability  and  subsequently  demonstrated  effectiveness,  the  equation  error 
criterion  to  be  minimized  is  taken  to  be  the  quadratic  functional 


f(a,,c.) 


=  £  w(n)q2(n) 
n=0 


(17) 


The  nonnegative  weights,  w(n),  are  usually  selected  to  be  montonically  non¬ 
increasing  (i.e.,  w(n)  >  w(n+l))  so  as  to  reflect  an  anticipated  degradation 
in  equation  error  accuracy  for  increasing  n.  This  degradation  behavior  arises 
primarily  from  a  loss  in  autocorrelation  estimate  fidelity  for  increasing  lags 
(i.e. ,  n) . 

In  minimizing  this  functional  with  respect  to  the  coefficients,  it  is 
apparent  from  relationship  vl.6)  that  the  c^  coefficients  have  no  effect  what¬ 
soever  on  the  e(n)  for  n  >  p.  This  being  the  case,  it  follows  that  the 
optimum  c^  coefficients  must  be  given  by 


cn  =  rx(0)  +  S  a.r^n-i) 
i=l 


0  <  n  <  p 


since  such  a  selection  will  render  the  equation  errors,  e(n),  identically  zero 
over  0  <  n  <  p  for  "any"  choice  of  the  a.;  autoregressive  coefficients.  It 
then  follows  that  the  optimum  autoregressive  coefficients  must  render  the 
remaining  terms  (i.e.,  p  <  n  <  N)  of  the  quadratic  functional  a  minimum. 

With  this  in  mind,  let  us  express  these  specific  set  of  conation  errors  In 
the  matrix  formal; 


e(p+l) 

e(p+2) 


I  rx(p)  rx( p-3 )  . 

i  rx(p+l)  rx(p) 


rx(l) 


U;L  VX(P+1) 

a,,  r  ( ! 

(-  A  ] 


Uy<0 


e(N-l) 


rx(N-2)  rx(N-3)  .  .  .  vx(N-|i-l)  u  rx(N~l) 


where  use  of  relationship  (l6)  for  n  >  j>  lias  boon  made.  Tills  matrix  system  of 
equations  can  be  conveniently  expressed  as 


(;  -  Ha  +  r 


Uyij) 


in  which  u  Is  the  pxl  autoregressive  coefficient  vector  with  e laments  a,, ,  a 
and  r  are  each  (N-p-l)xl  vectors  with  elements  e(p+n)  and  r(ptu),  respect i voly, 
and  11  is  a  (N-p-l)xp  Toeplitz  matrix. 


A  little  thought  will  convince  oneself  that  for  the  optimum  coefficient 
selection  given  by  relationship  (l8),  the  quadratic  functional  (17)  may  be 
equivalently  expressed  as 

f(a,c°)  =  [Ra  +  r]'w[Ra  +  r]  (20) 

in  which  W  is  a  positive  semidefinite  (N-p-l)  x  (N-p-l)  diagonal  matrix  whose 
diagonal  elements  are  given  by  wnn  =  w(p+n)  for  n=l,2,  .  .  .,  N-p-l.  The 
minimization  of  this  quadratic  function  with  respect  to  the  autoregressive 
coefficient  vector  is  straightforwardly  carried  out  and  results  in  the  follow¬ 
ing  system  of  p  linear  equations  for  the  required  optimum  autoregressive 
coefficient  vector 

[R"WR]a°  =  -R-'Wr  (21) 

One  then  solves  this  system  of  linear  equation  to  obtain  the  desired  optimum 
autoregressive  coefficients.  Upon  substitution  of  these  autoregressive  coeffi¬ 
cients  into  relationship  (l8),  the  optimum  c^0  coefficients  are  next  determined. 
Finally,  the  desired  power  spectral  density  estimate  is  obtained  by  substi¬ 
tuting  these  optimum  a^  ,  ci°  coefficients  into  relationship  (15). 

It  is  of  interest  to  note  that  the  system  of  equations  for  the  auto¬ 
regressive  coefficients  (21)  reduces  to  the  Box -Jenkins  method  for  a  weighting 
selection  of  w(n+p)  =  1  for  1  £  n  <_  p  and  zero  otherwise.  Unfortunately,  this 
particular  weighting  selection  implicitly  assumes  that  the  equation  errors  e(n) 
for  p+1  £  n  <  2p  all  have  the  same  statistical  behavior.  More  realistically, 
one  would  presume  that  the  equation  errors  become  more  random  as  n  increases . 

It  is  then  conjectured  that  the  primary  reason  as  to  why  the  Box-Jenkins 
method  does  not  provide  adequate  spectral  estimates  for  certain  problems  is 
due  to  this  particular  weighting  choice  and  the  fact  that  it  makes  no  use  of 
the  fundamental  autocorrelation  relationship  (lH)  for  n  >  2p  whatsoever. 

Indirect  Method 

Although  the  direct  method  has  been  found  to  provide  satisfactory  spectral 
estimation  performance,  the  indirect  approach  to  be  now  briefly  described  has 
yielded  significantly  better  performance.  Its  development  is  based  on  the 
coloring  filter's  characteristic  equation  (6),  and  the  fact  that  the  random 
variables  x(n)  and  e(ra)  are  uncorrelated  for  m  >  n.  To  begin  this  development, 
we  shall  first  replace  the  variable  n  appearing  in  relationship  (  6 )  by  k. 

Next,  each  side  of  this  characteristic  equation  is  multiplied  by  x(k-n)/(N-n) 
to  obtain 

— 

x(k)  x(kTn)  =  £  b.e(k-i)  -  E  a.x(k-i)  x(k-n) 

N"n  |i=0  1  i=l  1  J  H^T" 


89 


'MR*  MJP 


f 

I 


If  both  sides  of  this  equality  are  then  summed  over  the  index  range  h<k<_N, 
after  rearrangement  one  obtains 


e(n)  =  E 
i=l 


N 

1  E  x(k-i)x(k-n) 
N-n  k=n+l 


a.  + 
1 


N 

1  E 
N-n  k=n+l 


x(k)x(k-n) 


1 


for  p  <  n  <  N 

where  the  pseudo  equation  error  term  is  specified  by 


(22) 


"(n) 


q. 

z 

1=0 


b. 

l 


N-n 


N 
E 

k=n+l 


e(k-i)x(k-n) 


p  <  n  <  N 


Upon  examination  of  this  expression,  it  is  clear  that  the  expected  value  of  the 
term  e(k-i)  x(k-n)  will  be  zero.  '  This  would  indicate  that  the  general  pseudo 
equation  error  term  e"(n)  will  itself  tend  to  be  close  to  zero  (this  is  reen¬ 
forced  by  the  division  by  N-n).  With  this  in  mind,  a  logical  choice  for  the 
aj  coefficients  used  in  expression  (22)  would  be  one  which  tended  to  minimize 
the  pseudo  equation  error  sequence. 

If  one  compares  the  pseudo  equation  error  relationship  (22)  with  the 
equation  error  relationship  (l6),  a  similarity  is  in  evidence.  Namely,  the 
elements  within  the  brackets  of  expression  (22)  are  recognized  as  unbiased 
autocorrelation  estimates.  If  these  estimates  are  substituted  for  the 
entries  of  matrix  R  and  vector  r_  in  relationship  (19),  a  new  system  of  equa¬ 
tions  (21)  for  the  optimum  autoregressive  coefficients  arises.  These  new 
system  of  equations  distinguish  themselves  from  the  former  in  that  a  genuinely 
different  autocorrelation  estimate  formula  is  used  for  each  equation.  Once 
this  modified  system  of  equations  have  been  solved  for  the  coefficients, 
the  c^  coefficient  estimates  are  obtained  according  to 


c  °= 
'•n 


N-p  .  " 
r  k=p+l 


x2(k) 


n 

E 

i=l 


a. 

1 


N-r> 


N 

E  x(k)x(k-i) 
k=p+l 


0£n<p  ( 23 ) 


The  required  power  spectral  density  estimate  is  then  given  by  relationship 

(15). 

V.  NUMBER I CAL  EXAMPLES 

To  test  the  effectiveness  of  the  proposed  ARMA  spectral  estimator  method, 
the  classical  problem  of  detecting  the  presence  of  sinusoids  in  additive  noise 
will  be  considered.'  In  particular,  we  will  investigate  the  specific  case  in 
which  the  time  series  observations  are  generated  according  to 


x(n)  =  A1sin(irfin)  +  A2sin  (i^n)  +  w(n) 


1  <  n  <  N 


(21,) 


90 


I 


where  w(n)  is  a  zero  mean  Gaussian  time  series  with  variance  one.  This 
particular  problem  serves  as  an  excellent  vehicle  for  measuring  a  spectral 
estimator's  performance  relative  to:  (i)  detecting  the  presence  of  sinusoids 
in  a  strong  noisy  background,  and  (ii)  resolving  two  sinusoids  whose  fre¬ 
quencies  f^  and  f p  are  nearly  equal.  The  individual  sinusoidal  signal-to- 
noise  ratios  (SNR7  for  the  above  signal  are  given  by  201og  (A^/  >[2)  for  k=l,2. 
In  order  to  consider  the  effectiveness  of  the  proposed  ARMA  spectral  estimator 
in  different  noise  environments,  we  shall  consider  two  cases.  These  cases 
have  been  examined  in  reference  [8]  where  the  performance  of  many  modern 
spectral  estimators  are  empirically  compared. 

CASE  I:  Aq  =  /20,  fq=0. 4  and  A2  =  fiT,  f2=0.426 

In  this  example,  we  have  two  closely  spaced  (in  frequency)  sinusoids  in 
which  the  stronger  sinusoid  has  a  SNR  of  10  dB  while  the  weaker  sinusoid  has 
a  SNR  of  0  dB.  For  this  relatively  low  SNR  case,  the  spectral  estimator's 
ability  to  resolve  two  closely  space  sinusoids,  and  simultaneously  identify 
the  frequencies  will  be  tested.  Upon  generating  the  sequence  (24)  for  N=64, 
the  indirect  ARMA  spectral  estimator  method  was  used  for  a  selection  of 
weights  w(n)=N-n  and  p=15.  The  resultant  spectrum  is  displayed  in  Fig.  2a 
where  the  frequency  resolving  capability  of  this  method  is  in  evidence.  The 
frequency  identification  accuracy  was  also  excellent  in  that  the  sinusoid 
frequency  estimates  were  f^  =  0.398  and  f2  -  0.425. 

For  comparison  purposes,  the  covariance  AR  spectral  estimate  (basically 
the  maximum  entropy  method)  and  the  revised  Box-Jenkins  [6]  ARMA  estimate  of 
order  15  were  generated  using  the  same  data.  The  results  of  these  estimations 
are  displayed  in  Figures  2b  and  2c  where  an  inability  to  resolve  the  twd 
sinusoids  is  apparent.  This  gives  evidence  of  the  inherently  superior  pgf- 
formance  capability  of  the  herein  described  ARMA  spectrum  estimator  over 
standard  AR  estimator  procedures  and  other  ARMA  methods 

CASE  II:  Aq  =  {T,  fq  =  0.32812,  A2  =  /T,  fg  =  0.5 

We  are  now  examining  the  ability  of  the  ARMA  spectral  estimator  to 
detect  sinusoids  in  a  low  SNR  environment  (i.e.,  0  dB).  For  a  selection 
of  N=64,  w(n)  =  N-n  and  p  =  5»  the  resultant  ARMA  spectral  estimation  is  dis¬ 
played  in  Figure  3a.  Clear,  one  is  able  to  detect  the  presence  of  the  two 
sinusoids,  and  the  frequency  estimates  ft  =  0.3202  and  f2  =  0.5012  are  of 
good  quality  considering  the  prevailing  SNR  environment.  A  15th  order 
covariance  AR  spectral  estimator  was  then  found  to  generate  the  spectral 
estimate  displayed  in  Figure  3b.  Although  the  two  sinusoids  were  properly 
detected,  a  number  of  false  peaks  are  in  evidence. 


Digital  Filter  Design 


It  is  possible  to  use  the  proposed  ARMA  method  for  synthesizing  digital 
filters.  To  illustrate  the  approach  that  is  taken,  let  us  consider  the 
specific  case  of  designing  a  low-pass  filter  of  normalized,  cutoff  frequency 
f  .  One  may  readily  show  that  the  impulse  response  of  an  idealized  version 


91 


Spec.tr urn  -  dB 


of  this  low  pass  filter  is  given  by  sinffff  nj/ffn.  With  this  in  mind,  one 
then  applies  the  herein  developed  ARMA  procedure  to  the  specific  sequence 

x(n)  "=  sin[ufc(n-0. 5N)  ]/n(n-0.  5N)  1  £  n  £  N 

The  resultant  ARMA  model  obtained  in  this  manner  will  have  the  attenuation 
characteristics  of  the  desired  low-pass  filter.  To  illustrate  this,  a  15th 
order  ARMA  spectral  estimate  of  this  sequence  was  made  for  fc  -  0.2,  N  =  256 
and.  w(n)  =[N-nJf  The  resultant  filter's  magnitude  characteristics  are  dis¬ 
played  in  Figure  k  where  the  low-pass  characteristics  are  in  evidence.  In 
a  paper  now  in  preparation,  a  detailed  description  of  this  filter  synthesis 
orocedure  will  be  made  and  compared  to  an  alternate  method  [9]. 

VI.  CONCLUSION 


A  computationally  efficient  closed  form  method  for  generating  ARMA 
spectral  estimates  has  been  presented.  Conceptually,  the  method  offers  the 
promise  of  producing  superior  spectral  estimation  performance  in  comparison 
to  such  AR  spectral  estimators  as  the  autoregressive,  linear  predictive  coding, 
and  maximum  entropy  methods.  Empirical  results  have  substantiated  this 


conjecture. 


..X 


In  order  for  this  method  to  achieve  its  full  potential,  a  number  of 
important  considerations  need  further  investigation.  They  include  determi¬ 
nation  of  the  most  effective  autocorrelation  estimation  procedure  to  use 
since  an  inferior  procedure  will  general]^  result  in  poor  spectral  estimations. 
Another  important  consideration  is  the  choice  of  error  weights.  This 
weighting  selection  should  reflect,  in  some  manner,  our  growing  lack  of 
confidence  in  the  autocorrelation  estimates  for  increasing  lr.gs  (n).  Since 
no  statistical  assumptions  on  the  time  series  are  being  made  (other  thab  it 
is  an  ARMA  time  series),  it  is  apparent  that  the  weighting  sequence  should 
be  data  dependent.  One  further  consideration  is  that  of  determining  a  pro¬ 
cedure  for  obtaining  the  best  choice  of  the  ARMA  ordering  parameter  p. 

As  a  final  point,  it  should  be  noted  that  the!  herein  presented  procedure 
can  be  used  to  generate  ARMA  spectral  estimates  from  basic  AR  methods .  In 
particular,  one  could  use  any  standard  AR  method  (e.g.,  the  maximum  entropy 
method)  to  generate  the  autoregressive  coefficient  estimates.  Using  these 
coefficient  estimates  and  suitable  autocorrelation  estimates  (often  byproducts 
of  an  AR  method),  one  then  uses  relationship  (.18)  and  then  finally  expression 
(15)  to  generate  an  ARMA  spectral  estimate.  This  will  result  in  little  addi¬ 
tional  computational  cost  over  the  "pure’-  AR  method  due  to  the  simplicity  of 
relationships  (15)  and  (l$).  The  effectiveness  of  this  hybrid  approach  will 
be  subsequently  reported  upon. 


VII.  ACKNOWLEDGEMENT 

This  opportunity  is  taken  to  acknowledge  the  contributions  of  Koji  Ogino 
and  Behshad  Baseghi  in  this  effort. 


95 


g  ARM  method  of 


VIII.  REFERENCES 


[1]  L.  H.  Koopmans,  THE  SPECTRAL  ANALYSIS  OF  TIME  SERIES,  Academic  Press, 

New  York,  1974. 

[2]  D.  G.  Childers,  MODERN  SPECTRUM  ANALYSIS,  IEEE  Press,  1978. 

[3]  P.  R.  Gutowski ,  E.  A.  Robinson,  and  S.  Treitel,  "Spectral  Estimation, 

Fact  or  Fiction",  IEEE  Trans.  Geoscience  Electronics,  Vol,  GE-16,  No.  2, 
pp.  80-84,  April  1978. 

[4]  S,  A.  Tretter  and  K.  Steiglitz,  "Power-Spectrum  Identification  in  Terms 
of  Rational  Models",  IEEE  Trans.  Automatic  Control,  Vol.  AC-12, 

pp.  185-188,  April,  1967. 

[5]  G.  Box  and  G.  Jenkins,  TIME  SERIES  ANALYSIS;  FORECASTING  AND  CONTROL, 
Revised  Edition,  Holden-Day,  San  Francisco,  1976, 

[6]  M.  Kaveh,  "High  Resolution  Spectral  Estimation  for  Noisy  Signals",  IEEE 
Trans.  Acoustics,  Speech,  and  Signal  Processing,  Vol.  ASSP-27,  No.  3, 
pp.  286-287. 

[7]  J.  F.  Kinkel ,  J.  Perl,  L.  Scharf,  A.  Stubberud,  "A  Note  on  Covariance  - 
Invariant  Digital  Filter  Design  and  Autoregressive  Moving  Average 
Spectrum  Analysis",  IEEE  Trans.  Acoustics,  Speech,  and  Signal  Processing, 
Vol.  ASSP-27,  No.  2,  pp.  200-202,  April  1979. 

[8]  T.  M.  Sullivan,  0.  L.  Frost,  J.  R.  Treichler,  "High  Resolution  Signal 
Estimation",  ARGO  Systems  Inc.  Tech  Report,  June,  1978. 

[9]  L.  L,  Scharf  and  J.  C.  Luby,  "Statistical  Design  of  Autoregressive-Moving 
Average  Digital  Filters",  IEEE  Trans  on  Acoustics,  Speech,  and  Signal 
Processing,  Vol.  ASSP-27,  No.  3,  pp.  240-e47»  June,  1979. 


9 1 


EXTRAPOLATING  BANDLIMITED  SIGNALS 
WITH  NOISE  AND  QUANTIZATION 


KENNETH  ABEND  AND  JUDITH  R.  PLATT 


RCA  Government  Systems  Division 
Missile  and  Surface  Radar 
Moorestown,  New  Jersey  08057 


Abstract 


In  many  applications  of  spectral  analysis  (e.g.,  short  radar  dwells)  the 
need  arises  to  obtain  spectral  resolution  from  an  extremely  short  time  segment 
of  a  bandlimited  signal.  Instead  of  applying  a  window  and  assuming  the 
function  to  be  zero  outside  of  the  observed  segment,  the  modern  approach  is  to 
extrapolate.  Knowledge  of  the  bandwidth  of  the  signal  allows  for  accurate 
extrapolation  over  a  limited  time  interval  that  is  many  times  the  length  of 
the  given  segment,  provided  that  the  signal  is  sampled  at  many  times  the 
Nyquist  rate.  Of  several  alternate  ways  to  find  a  bandlimited  signal  of 
minimum  energy  that  fits  the  observed  samples,  Cadzow's  method  is  the  simplest 
because  it  utilizes  a  matrix  whose  size  is  determined  by  the  number  of  samples 
before  extrapolation.  However,  the  ill  conditioned  nature  of  the  Cadzow 
matrix  makes  it  extremely  difficult  to  extrapolate  coarsely  quantized,  noisy 
signals.  We  solve  this  problem  with  an  iterative  procedure  that  preserves 
the  small -matrix  advantage  of  Cadzow's  method.  Examples  involving  non-ideal 
signals  quantized  to  as  few  as  five  bits  are  investigated  in  order  to  determine 
the  extent  of  reliable  extrapolation. 

Introduction  and  Summary 

The  purpose  of  this  paper  is  to  demonstrate  that  a  time  limited  set  of 
samples  of  a  bandlimited  signal  can  be  extrapolated  with  limited  computer 
resources  and  real i Stic  signals.  When  Cadzow's  one-step  extrapolation 
procedure!- 1 -3J  is  applied  to  finely  quantized  samples  of  a  bandlimited  signal 
in  bandlimited  noise,  the  signal  and  noise  are  satisfactorily  extrapolated. 

When  the  noise  bandwidth  is  increased  and/or  when  the  noi^y  samples  are 
coarsely  quantized,  problems  arise  due  to  limited  computer  accuracy  because  a 
set  of  linear  equations  is  ill  conditioned.  We  show  that  by  solving  these 
equations  iteratively  by  the  method  of  steepest  descent,  reliable  extra¬ 
polation  can  be  performed  in  the  presence  of  noise  with  the  input  Quantized 
to  as  few  as  five  bits  plus  sign. 


By  using  signals  consisting  of  the  sum  of  sine  waves  with  unequal 
amplitudes  and  phases,  we  obtain  a  more  realistic  picture  of  the  limitations 
of  the  algorithm.  Specifically,  while  the  input  may  be  limited  to  a  small  or 
a  large  fraction  of  a  Nyquist  interval,  the  extrapolated  signal  is  seldom 
reliable  for  much  more  than  three  Nyauist  intervals. 

Bandlimited  Extrapolation 

Let  the  signal  g(t)  be  bandlimited  to | f | < B ,  i.e.,  its  Fourier  transform 
G(f )=/_®g(t)exp(-j27rft)dt  satisfies 


G (f )=  0  for  1 f | >  B .  (1) 

If  q(t)  =  0  for  0<t<T  with  T>0,  then  g(t)=0  for  all  t.  By  considering 
g(t)-gi  (t)  with  g(tT=g-|(t)  for  0<t<T,  we  see  that  if  g(t)  is  given  for 
0<t<T,  it  is  uniquely  determined  for  all  t.  Cadzow's  one-step  extrapolation 
procedure  is  based  on  finding  a  signal  z(t)  with  the  following  properties. 


(a)  z(t)  =  0  for  t<0  and  t>T 

(b)  Bandlimiting  Z(t)  to  B  produces  a  signal  that  agrees  with  g(t)  for 
0<t<T  (and  hence  for  all  t),  i.e., 


where 


g(t)  =J  [yz(T)exp(-j2TrfT)dxJ  exp(j2Trft)df 

ri 

=  2B  /  z(x)sinc[2B(t-T )]dt 
Jo 


sin  (ttU ) 
sine  u  =  - i — L 

iru 


(2) 


(3) 


Let  g(t)  be  sampled  at  r  times  the  Nyauist  rate,  i.e.,  A<1/(2B),  where  A 
is  the  sampling  interval  and  1/2B  is  the  Nyquist  interval,  so  that: 


r 


2BA 


>  1 


(4) 


If  we  are  given  only  M  samples  of  g(t):  g(mA) ,m=l ,2, . . . ,M,  we  approximate 
equation  (2)  by 

M 

g(rnA)  =7:  ~  sine  (  ^  )z(kA)  (5) 

K  *  I 


100 


Since  z(t)  is  not  bandlimited,  a  must  be  very  small.  In  this  sampled-data 
low-pass  case,  Cadzow's  algorithm  simplifies  to: 

1.  Solve  M  linear  equations  in  M  unknowns  for  z(kA),  k=l,2,...,M  from 
equation  (5)  with  m  =  1,2,. ..,M. 


2.  Using  these  M  values  of  z,  determine  g(mA)  for  m<0  and  m>M  from 
equation  (5). 

More  generally,  we  can  simultaneously  interpolate"'  and  extrapolate  by  using 

M 


g(t)  =  2Ba  sine  [2B(t-kA)]z ( kA ) 

K  =  I 


(6) 


For  the  bandpass  case  (F]<|f|<F2)  Cadzow  develops  equations  similar  to 
(2)  and  (5),  with  the  sine  function  replaced  by  the  difference  of  two  sine 
functions.  However,  if  we  use  the  complex  envelope,  g(t),  where 


gBP(t)=Re[g(t)exp(j2irf0t)]  (7) 

with  f0=  (Fp+F]  )/2 ,  then  equation  (1)  through  (6)  remain  valid  with  B= (F2 -F^/2 
(the  bandwidth  is  2B).  We  can  sample  the  complex  signal ^  g(t)  at  r  times  the 
Nyquist  rate  (2B)  and  utilize  equation  (5)  in  steps  1  and  2  above  to  extra¬ 
polate  both  the  real  and  imaginary  parts  of  g(t).  Preliminary  results, 
obtained  to  date,  seem  to  be  nearly  independent  of  whether  the  real  or 
complex  formulation  is  used.  The  results  presented  in  this  paper  were 
obtained  on  the  Hewlett-Packard  System  45  desktop  computer  using  the  real 
formulation. 

We  express  the  sampled  signal  in  tne  form 

M 

9m  ~ zL<r  ^mk  zk 

K  ■  I 


1.  In  contrast  to  Nyquist  interpolation  by  either 

, .  1  sine (t/A-m)g (mA)  or  g(t)=  sine (2Bt-m)g (m/2B) . 

9<t)  *- 

2.  This  is  commonly  referred  to  as  I  and  Q  sampling. 


101 


(9) 


and  for  m=l,2,...,M;  in  the  vector-matrix  form 


G  =  H  Z 

Here  G  and  Z  are  M-dimensional  column  vectors  with  elements  gm=g(mA)  and 
Zk=z(kA),  and  H  is  an  MxM  Toeplitz  matrix  with  elements  (i^sinc  ((m-k)/r)/r 
in  the  low-pass  case.  Jain  and  RanganathL^J  have  shown  that  given  G,  (8) 
and  (9)  produce  a  minimum  norm  least  squares  solution  for  N  values  of 
g(mA)  with  N>  M. 

The  most  direct  solution  involves  inversion  of  the  matrix  H.  This  is 
especially  useful  when  the  bandwidth  is  known  a  priori  and  H  can  be  inverted 
off  line.  Because  of  the  ill  conditioned  nature  of  H,  Jain  and  Ranganath 
propose  iterative  solutions  that  involve  NxN  matrices.  By  utilizing  only 
MxM  matrices,  we  reduce  the  computational  complexity  by  an  order  of 
magnitude.  We  therefore  consider  the  solution  of  (9)  alone,  and  use  that 
solution  in  (8). 

r  5i 

Method  of  Steepest  Descent1-  J 

This  technique  which  progresses  geometrically  to  the  true  solution  from 
an  initial  guess  proved  to  offer  valid  results  for  all  examples  whether  ill- 
conditioned  or  well -conditioned.  The  iterative  procedure  is  initiated  with 
an  estimate  of  the  vector  values 


and  continues  with  the  single  step  iterative  equation  for  a  symmetric 
positive  definite  matrix  H: 

z(k+l>  =  z00  +  a<k)  R(k)  (10) 


Here  R^  is  the  vector 


R<k>  =  [r/k>,  rM.  ....r„<k)]T 


with  elements  m 

(k) 


r’ . '  h’JzJ<k) 


(k) 

and  the  scalar  a  '  '  is  given  by 


102 


(k) 


.(fcd 


'Trf . . L 


equation  is 


(k+1) 


Z(k)  tf'a'Vv.M 

*  X 


i-o 


with  the  scalar  values  a£(k)  determined  from 


^a/kV'R(k).H£+1 

fa 


rM)  =  (HiR^k^ ,  R^) 


i=0,l . P-1 


where  p  is  the  number  of  iterative 
of  the  matrix  with  itself  l  times, 
product  of  A  and  B. 


steps  per  computation, 
and  the  notation  (A,B) 


H  ^  is  the  product 
represents  the  dot 


,,,jvsi,:iKriSXrir,£.;rirs&rs;.,,, 


Results  With  Inaccurate  Matrix  Inversion 

To  demonstrate  that  results  are  obtainable 

3XH?  s-St&st  rrs-iars?- 

of  steepest  descent  is  reauired. 

As  an  arbitrary  example,  g(t)  was  chosen  to  be  the  sum  of.  two  cosine 


waves: 

g(t)=A1  COS  (2irf -jt+01 )  +  A2  cos  tfirfgt+flg) 


103 


with  A-j  =  0.5,  A2  =  1.0,  f  1  =  0.4,  f2  =  1/6,  0-j  0,  and  02  =  if/3 

(Figure  la).  We  assume  we  know  only  that  the  signal  is  low-pass  limited  to 
B=0.5.  Given  seven  samples  at  eight  times  the  Nyquist  rate  over  a  duration 
of  3/4  of  a  Nyquist  interval  (Figure  1),  a  4:1  extrapolation  was  obtained 
(Figure  lb  for  -l<t<2).  With  uniformly  distributed  independent  noise  samples 
(40  dB  down)  extrapolation  was  not  achieved.  With  bandlimited  Gaussian  noise 
( | f | <• 5) ,  however  (Figure  2a),  extrapolation  was  no  problem  (Figure  2b). 

On  the  other  hand,  using  simple  matrix  inversion  required  at  least  14  bits 
quantization  of  the  input  samples  without  noise  (Figure  3),  and  more  than  32 
bits  with  noise  (Figure  4). 

Though  some  improvements  in  thernoisy  quantized  case  were  obtained 
either  by  using  Levinson's  algorithm^-6*' -I  or  by  adding  a  small  constant  to 
the  diagonal  elements  of  H  before  inverting™],  accurate  extrapolation  with 
coarse  quantization  was  obtained  only  by  the  method  of  steepest  descent. 

The  noise  samples  for  Figures  2a,  2b,  and  4  were  obtained  by  adding  12 
uniformly  distributed  random  variables  to  produce  each  of  seven  independent 
normal  noise  samples,  and  then  multiplying  that  seven  dimensional  noise 
vector  by  the  lower  triangular  decomposition  of  the  desired  covariance 
matrix,  H.  When  this  same  example  was  run  using  the  steepest  descent 
algorithm  4:1  extrapolation  was  obtained  even  at  10  dB  S/N 
and  6  bits  quantization  (simultaneously).  Only  at  4  bits  quantization  did 
the  problems  observed  with  the  inaccurate  algorithm  reappear. 

Results  With  Method  of  Steepest  Descent 

A  variety  of  examples  were  used  to  test  the  extrapolation  algorithm 
incorporating  the  method  of  steepest  descent.  To  demonstrate  the  results 
using  the  real  bandpass  formulation  of  the  algorithm,  another  arbitrary 
sum  of  two  cosine  waves  was  selected.  Its  parameters  are  Aq  =0. 5 ,  A2=l , 
f]=l/25,  f 2=1  /30,  0i=O,  and  0o=7r/3 ;  this  is  shown  in  Figure  5.  Using  the 
passband  1/36  <|f|<1/24  and  nine  samples  at  intervals  of  A=0.5  starting  at 
t=l  and  terminating  at  t=5;  the  signal  was  extrapolated  over  the  interval 
-50  to  +65  yielding  251  points  of  which  150  appear  to  be  valid. 

The  effect  of  quantization  is  illustrated  in  Figure  6  using  the  same 
example  but  quantizing  the  samples  at  various  levels.  Figure  6  shows  the 
results  for  9  bits  plus  sign,  7  bits  plus  sign  and  for  5  bits  plus  sign. 

Valid  extrapolations  with  quantizations  as  low  as  4  bits  plus  sign  were 
obtained.  Extrapolation  with  bandlimited  noise  added  to  signal  samples  is 
illustrated  in  Figure  7  for  rms  noise  amplitudes  of  -25  db,  -20  db,  -15  db, 
-10  db  and  0  db.  Figure  8  illustrates  the  effect  of  noise  and  quantization; 
Figure  8a  shows  the  effect  of  noise  and  quantization  for  a  particular  noise 
level  and  Figure  8b  shows  a  particular  sample  of  noise  at  various  signal  to 
noise  ratios. 


104 


Each  alteration  of  the  signal  samples  -  quantization,  bandl imited  noise, 
bandl imited  noise  and  quantization,  etc  -  creates  a  different  signal.  The 
extrapolation  procedure  produces  a  minimum  norm  least  sauares  estimate^  0f 
the  altered  signal,  i.e.,  the  bandl imited  signal  matching  the  given  samples 
that  has  the  minimum  energy. 

Valid  estimation  of  the  signal  passband  is  crucial  to  the  results 
obtained.  This  is  demonstrated  in  Figure  9  where  the  extrapolated  signal 
using  various  passbands  is  compared  to  the  true  signal.  It  is  evident  that 
the  better  the  estimate  of  the  passband  the  better  the  extrapolation  and  the 
longer  its  extent. 

Another  parameter  of  interest  is  the  choice  of  sampled  time  interval. ^ 
For  the  same  arbitrary  signal  different  time  intervals  were  selected,  all 
other  parameters  are  unchanged,  the  results  are  shown  in  Figure  10. 

Conclusions 


Different  degrees  of  extrapolation  are  obtained  with  different  signals 
and  different  sampling  rates.  In  the  lowpass  example  we  used  seven 
samples  at  eight  times  the  Nyquist  rate,  spanning  3/4  of  a  Nyquist  interval. 
The  extrapolated  signal  is  valid  "nr  three  Nyquist  intervals,  giving  a  4:1 
extrapolation.  In  the  bandpass  ample  we  used  nine  samples  at  75  times 
the  Nyquist  rate,  spanning  1/9  or  a  Nyquist  interval.  The  extrapolated 
signal  is  valid  for  two  Nyquist  intervals,  giving  an  18:1  extrapolation. 

By  increasing  the  spacing  between  samples  we  reduce  the  extrapolation  ratio 
and  therefore  can  increase  the  duration  of  validity  of  the  extrapolated 
signal  only  slightly.  By  examining  many  other  examples  we  are  led  to  the 
conclusion  that,  with  a  small  number  of  samples  of  a  non-ideal  signal, 
reliable  extrapolation  is  limited  to  a  very  few  Nyquist  intervals.  Thus 
spectral  analysis  of  the  extrapolated  siynal  must  be  performed  by 
techniques  other  than  the  Fourier  transform  (e.g.,  maximum  entropy). 

Cadzow's  one  step  extrapolation  procedure  is  by  far  the  most  easily 
implemented,  in  that  for  extrapolating  from  M  points  to  N  points  only  an 
MxM  matrix  need  be  inverted.  The  problem  that  arises  is  that  this  matrix  is 
ill  conditioned  and  thus  difficult  to  invert  accurately  enough  to  obtain 
reasonable  results  with  imperfect  signa.s  (having  noise  and  quantization). 

We  solve  this  problem  by  solving  the  M  equations  in  M  unknowns  iteratively 
by  the  method  of  steepest  descent. 

Jain  and  Ranganath^  have  recently  described  two  iterative  procedures, 
also  designed  to  overcome  this  same  problem.  The  first  is  to  use  steepest 
descent  to  improve  the  rate  of  convergence  of  PapoulisL^J  iterative 
extrapolation  procedure.  The  second,  a  conjugate  gradient  algorithm,  is 
aimed  directly  at  the  final  answer,  as  is  our  method.  However,  both  of 


their  iterative  procedures  work  with  NxN  matrices  rather  than  MxM  matrices, 
where  N-»M.  As  they  point  out  [4,  Sect.  7.5],  Cadzov/'s  method  (involving 
an  MxM  matrix,  H)  produces  a  minimum  norm  least  squares  solution.  Its 
shortcoming,  the  ill  conditioned  nature  of  H,  we  have  overcome. 

Bib!  ioqraph.y 

1.  Cadzow,  J.A.,  1  979,  "An  Extrapolation  Procedure  For  Band-Limited  Signals'1, 
IEEE  Trans,  on  Acoustics,  Speech  and  Signal  Processing,  VoT".  ASSP-27 

No.  1 ,  pp  4-12. 

2.  Cadzow,  J.A.,  1978,  "Reconstruction  of  Signals  From  Their  Linear  Mapping 
Image" ,  IEEE  Internationa]  Conference  on  Acoustics,  Speech  and“Signai 
Processing,  pp,  646-650. 

3.  Cadzow,  J.A. ,  1978,  "Improved  Spectral  Estimation  From  Incomplete 
Sampled  Data  Observations, 11  Proceedings  of  the  R'A'DC  Spectrum  Estimation 
Workshop,  pp.  109-1*23. 

4.  Jain,  A.K.  and  Rariganath,  S.,  1978,  "Extrapolation  and  Spectral 
Estimation  Techniques  For  Discrete  Time  Signals/  Department  of 
Electrical  Engineering,  Unfversity  of  California",  Davis,  California. 

Final  Report  under  Contract  F30602-75-C-0122,  Rome  Air  Development 
Center,  Gr iff i ss  A.F.B.,  N.Y. 

5.  Rerenzin,  I.S.  and  Zhidkov,  N.P.,  1975,  Computing  Methods,  Vol .  II, 
Pergamon  Press,  Oxford,  Great  Britain,  Chapter  6. 

6.  Zohar,  Shalav,  1974,  "The  Solution  of  a  Toeplit.z  Set  of  Linear  Equations," 
Journal  of  the  Association  for  'Compu ting  MachTneryTVoT.  21,  No.  2, 

pp.  272-276. 

7.  Levinson,  N.,  1947,  "The  Weiner  RMS  (root  mean  square)  Error  Criterion 
in  Filter  Design  and  Prediction]",  J.  Mat'll  PhysTTvol .  25,  pp.  261  -2782 

8.  Wiley,  R.G.,  1979,  "Concerning  the  Recovery  of  a  Band! imiicd  Signal  or 
its  Spectrum  from  a'HriTte  SegnTen  t  "TI'ETEi  ran  sa'ctl  on  s  on 

Comm  uni  cat!  on  s  ,~Vol ,  COM -27 ,  No."  1 ,  pp.  251  -252. 

9.  Papoulis,  A.,  1975,  "A  New  Algorithm  in  Spectral  Analysis  and  Band! imi ted 
Extrapolation" ,  i E E E~Yr  a n s action’s  on  C i’rcuTts.  and  Systems]- Vol .  CAS -22, 

No.  9,  pp.  735-742. 


Sum  of  Cosines  (a)  True  Signal  arid  Given  Samples 


Independent  runs,  (b)  Extrapolated  sianol. 


t 

u 


108 


i 

! 

i 


{ 

i 

/ 

i 

s 


r 


32  Bits  Ouantlzatlon  (several  independent  runs) 
using  Inaccurate  matrix  Inversion. 


109 


AMPLITUDE 


FIGURE  7.  Signals  In  Band! imi ted  Noise 


t»  HW.MWIH  -  - 


8B.ee 

3Q.ee 

4Q.ee 

se.ee 

se.ee 

ie.ee 

e.ee 

-ie.ee 

-SB. 00 

-30.00 

-4e.ee 

-30.ee 

-80.00 


AMPLITUDE 


FIGURE  8A.  Noise  And  Quantization 
(7  Bits  Plus  Sign;  20  DBS  SNR) 


113 


3WIJL 


ACCURACY  OF  SPECTRAL  ESTIMATES 
OF  BAND-LIMITED  SIGNALS 

WILLIAM  B.  GORDON 

Radar  Division,  Code  5308 
Naval  Research  Laboratory 
Washington,  D.C.  20375 

ABSTRACT 

We  consider  the  problem  of  estimating  the  spectrum  of  a  band-limited 
signal  perturbed  by  additive  white  noise .  Sharp  bounds  on  the  mean  square 
errors  of  linear  spectral  estimates  are  computed  and  expressed  as  functions 
of  time-bandwidth  products  and  signal-to-noise  power  ratios. 

1.  INTRODUCTION 

A  central  problem  in  the  theory  of  stationary  time  series  is  to  estimate 
the  spectrum  of  a  stationary  process  N<t)  when  the  given  data  consists  of 
samples  of  s(t)  =  f (t)  +  N(t),  where  f(t)  is  a  deterministic  trend  of  known 
functional  form.  In  this  paper  we  shall  consider  the  dual  problem:  the 
functional  form  of  f (t)  is  unknown,  the  second  order  statistics  of  N(t)  arc 
known,  and  the  problem  is  to  estimate  the  spectrum  of  f(t).  The  signals  f(t) 
will  be  assumed  to  be  band-limited  functions  having  continuous  Fourier  trans¬ 
forms  f (v)  which  vanish  for  |v|  >  W.  Such  signals  are  pulse-like,  and  the 
amount  of  useful  information  contained  in  an  observation  time  window  depends 
as  much  on  the  position  of  the  window  as  it  does  on  its  length.  Accurate 
spectral  estimation  requires  that  the  observation  time  window  capture  a 
significant  portion  of  the  signal  energy. 

We  shall  find  that  when  the  data  is  sampled  at  the  Nyquist  rate  2W 
consistent  spectral  estimators  do  not  exist,  in  the  sense  that  infinitely 
accurate  estimates  cannot  be  obtained  from  infinitely  long  data  records. 

For  signals  with  effective  time  duration  T^  and  signal-to-noise  power  ratio 
(S/N) ,  most  of  the  useful  information  is  contained  in  a  time  window  of 

length  N  /(2W),  where  N  =  (l/'ii)  /(2WT  )^(S/N)  .  For  any  linear  spectral 

c  n  0  ■— » 

estimator  there  exist  signals  whose  corresponding  spectral  estimates  have 
relative  mean  square  errors  on  the  order  of  (1/N^)  and  absolute  mean  square 
errors  which  are  almost  as  large  as  the  largest  produced  by  the  conventional 


117 


I 


transform  with  to  data  poi.nts. 
c 

If  the  data  is  sampled  at  a  rate  higher  than  2W,  longer  time  windows  can 
be  effectively  used,  and  the  conventional  Fourier  transform  provides  a  con¬ 
sistent  spectral  estimator  as  the  data  rate  increases  without  bound.  If, 
however,  the  time  window  is  fixed,  consistent  linear  spectral  estimators  do 
not  exist.  Hence,  to  summarize,  consistent  linear  spectral  estimators  exist 
only  if  both  the  data  rate  and  the  length  of  the  time  window  increase  without 
bound . 


The  problem  of  spectral  estimation  is  essentially  different  from  the 
problem  of  spectral  peak  detection  and  location,  and  hence  our  pessimistic 
results  concerning  the  former  do  not  preclude  tho  possibility  of  high  resolu¬ 
tion  (supergain)  spectral  peak  detectors,  such  as  have  been  recently  proposed 
for  band-limited  signals  (5,0,11,12).  However,  improved  resolution  has  as  a 
necessary  consequence  a  decrease  in  accuracy  and  dutectibility,  and  wu  hope 
to  discuss  this  matter  in  subsequent  papers. 

2.  Till!  SPACE  H  (W) 


We  shall  consider  a  class  li(W)  pf  complex-valued  band-limited  functions 

A  |  ,  «*» 

f  -=>  f(t)  whose  fourior  transforms  f'«  f(v)  are  continuous  and  vanish  for 
J  v |  >  W,  so  that 


+w 


fit) 


f(v)  exp  (2iiivt]  dv  . 


(2.1) 


-W 


f  (v) 


/ 


f(t)  exp  [ — 2 i v 1 3  dt. 


(2.2) 


We  shall  also  require  the  functions  f  in  H(W)  to  satisfy  M2(f)  <  »  where  the 
moment  M  =  M  (f)  is  defined  by 


PftvH  utti  lii  irrJ - *  . . 


118 


J'^HI  l  »M3!3»^5E  7rT~.j,.."a-,rner.. 


■  4tt 


4*co 


/ 


t2  f (t) | 2dt  * 


+W 


/ 

-w 


f '  (v)  d\). 


(2.3) 


Wo  follow  the  standard  terminology  of  radar  and  communications  thoory 

(7,14]  in  defining  the  signal  unorgy  fi  u  E!(f),  moan  time  t  «  t(f)  ,  and 

effective  time  duration  T  T  (f)  by 

u  o 


4‘ou 


/ 


+w 


/ 


[f  (t)  I*  dt  ■  I  |f(v)fdv. 
-W 


+  00 

l 


2 


t  -  (1/li)  /  t  l:(t)  ‘  dt  . 


T2  -  (4n2/b) 
u 


4*w 


<t-t)2  |f (t) | 2  dt  . 


Using  Bobolov  space  techniques  wu  gut  tlui  fundamental  inuquulitiuu 


if(t)|2  <  w-2~  |i- 

2i\  t  I  (2irWL) 


(2.4) 


!f(v)|2  <  „«,(£)  . 


(2.5) 


The  timo-bundwidth  product  will  appear  frequently  in  our  subsoquunt 

discussion,  one  can  easily  establish  tho  "uncertainty  relation" 

2W  T  >  ii  , 
o  -  ' 

which  is  tho  sharpest  possible  inequality  of  this  typo  for  H(W). 


119 


3.  THIS  ERROR  FUNCTIONS  Q2 ,  R2  ,  R2 

_ _ _  o 

A 

When  thu  data  ia  unpurturbud  by  noise,  f(v)  will  bo  ostimatod  by  dis¬ 
crete  transforms  of  thu  type 


f,(v)  -  f<tu)  , 

and  wu  ahull  durivu  inequalities  of  thu  typu 

|f(v)  -  f*(v)j2  <  Q2(v,  t,  ji)  (3,1) 

,22 

where  y  “  U  (v<  t,  g)  iu  a  certain  explicit  function  of  thu  frequency  v, 
thu  u ample  point  out  t  «  {t^,  1.^ (  . ..,  t^),  and  thu  wulyhtu 

1^'  ,,,/  Wluau  iRuqvwlitiuu  are  uharp  in  the  sense  that  for 

every  yiven  out  of  values  for  v,  t,  £  there  axiuta  a  function  f  in  II (W)  for 
which  thu  inequality  (3,1)  becomes  an  equality.  Hence  the  function  y2  can  be 
inturprutud  ao  the  luryowt  possible  squared  error  in  t(k(v)  (whun  noiun  is 
abuunt)  normalized  by  thu  moment  M  . 

ti 

Uupponu  now  that  thu  data  ia  perturbed  by  additive  white  noise  N(t)  with 
zero  mean,  Thu  dutu  cenuiotu  of  samples  of  u(t)  **  fit)  +  N (t) ,  and  thu 
problum  iu  tu  estimate  f(v)  by  ouma  of  thu  type  >;  £  a(t  ).  Thie  estimate 

a  A  1111 

hua  muan  and  biau  equal  to  f*(v)  and  (f(v)  -  f*(v)),  rmp.,  where  £t  hue  thu 
a.uiiu  mouniny  au  before.  Thu  variancu  iu  independent  of  f  and  iu  given  by 


Variance  (U  u(t  ))  -  a2  ))  |«  .j2 

n  n  1  n 1 

where  o'  u  E  t|w(L) | ].  For  any  f  in  II  (W)  thu  mean  uquuru  error  in  the 
upuotral  uutimato  1  u(t  )  iu  given  by  R2(f)  -  (bi*u)2  r  variance,  or 


l<2(f)  -  |£(v)  -  iw(v)  |2  +  a2  1  ||ij2  .  (3.2) 

We  define  R2  *  sup  (R2(f)),  wliure  the  uupremum  ia  taken  over  the  aet  of  all 
f  in  il(W)  having  a  yiven  value  of  M2>  Thun  from  thu  sharpness  of  thu  inequal¬ 
ity  (3.1)  wu  have 


r2  -  m2  y2(v,  t,  g)  +  o2  f 


(3.3) 


120 


2  2 

and  for  ovary  2  >  0  there  exist  f  in  II (W)  for  which  K  (f)  >  R  -  u.  Explicit 

2  0 
formulas  for  q  (v,t,£)  will  bo  givon  in  Section  7  for  arbitrary  valuua  of 


v,t,  and  d. 

f  tM  '  « 


4.  THE  PARAMETER  N 


for  uimpliuity  wo  now  auuumo  thut  the  wamplu  point  sot  {t^}  iu  givon  by 
t  ■  n/(2W)  whuro  the  intugur  n  vurius  over  thu  uot  { n  i  -N  <  n  <  U}.  Tlio 
number  of  samples  will  bo  donotud  by  (  «■  2N  +  1)  ,  and  T  will  demote  the 
length  of  thu  timu  window.  Ilonou  N  *  2WT  and  thu  time  window  iu  l-T/2,  T/2] . 

H 

When  thu  conventional  f'ourior  transform  la  uuud  (i  «  (1/2W)  uxp  [ninv/W]  and 

2  11 

it  turnu  out  that  ^  iu  independent  of  v  and  iu  very  clouuly  upproximatud  by 

Q2  *  2W/  < 'ii 2 N  ). 

u 

Hence,  when  thu  data  iu  uamplod  at  Uhi  Nyguiut  rate  and  thu  conventional 
iouriur  trunuform  iu  uuod  wu  have 


.2 


(4.1) 


and  U"  iu  a  oonvux  function  of  N  .  wliouu  minimum  vuluu  iu  attained  ut  N  ■  N 
o  u  •  a  c 

whuru 


3  1  h 

UW  M  J 

2  2~ 

'll  u 


Per  function!?  1  uuoiufying  l-(i')  0  wo  have  ■  T  E  and  thuruforu 


(4.2) 


N  «  (l/'li)  |  (2WT  )  (U/N) 

°  l.  °  J 


whuru  th'i  signal- to~noiue  power  ratio  iu  defined  by 


(S/N) 


time  -  avuragod  signal  power 
average  name  power 


44.3) 


121 


2 

Although  R  becomes  an  increasing  function  of  N  when  N  >  N  ,  we  have 
o  3  0  s  s  c 

no  right  to  assume  that  the  same  is  true  for  R'(f)  for  any  particular  f. 

However,  it  can  be  shown  that  for  any  f  in  II  (W),  u  (f)  evuntuully  becomes  an 

increasing  function  of  N  ,  and  using  some  gross  estimates  this  can  bu  shown 

**  2 

to  bu  the  case  when  N  >  N  . 

a  -  c 

5.  Tim  MAIN  RESULT 

We  shall  now  prusunt  our  main  result  (equation  (b.l)  below)  which  shows 

that  consistent  spuctral  estimators  do  not  exist  when  the  noiuu  is  white  and 

the  data  is  sampled  at  the  Nyquist  rate.  In  the  last  paragraph  we  suw  that 

2 

when  the  conventional  fouriur  transform  is  used  the  error  function  U  uvunt- 

o 

uully  becomes  an  increasing  function  of  N  us  N  -*«.  for  each  vulue  of  N  we 

as  u 

sliall  now  choosu  a  set;  of  weights  ^  which  is  "optimal"  in  thu  sense  that  it 

2 

minimizes  the  right-hand  side  of  (3.3)  for  yiven  values  of  M  and  u  .  Thuse 

* 2 

weights  will  bu  called  "a-  optimal",  and  when  they  are  usud  U  becomes  a 
monotonically  decreasing  function  of  N  .  We  define 

R^  -  lim  R^ 

u)  o 

N  -**> 
s 

and  from  the  definitions  it  is  evident  that 


R*  -  inf  i'R2} 

W  Q 

whore  thu  infimum  is  takun  over  thu  set  of  ail  linear  spectral  estimates 
obtained  by  sampling  at  the  Nyquist  rate.  lienee,  for  any  such  linear  spectral 
estimate  and  for  any  iS  >  0  there  exist  functions  £  in  II  (W)  satisfying 


122 


This  expression  vanishes  only  when  v  »  +  W,  which  is  a  reflection  of  the  fact 
that  f  (+W)  ■  0  for  all  f  in  I!  (W)  . 

2 

Lut  X  dunote  the  smallest  possiblu  value  of  R  when  the  conventional 
c  o 

transform  is  usud,  i.e.,  the  value  obtained  by  setting  N  -  N  in  (4.1),  and 

s  c 

let  X  dunote  the  right-hand  side  of  (2.5).  Then  X  is  the  mean  square  error 

^  a  o  2 

of  the  "trivial"  estimate  f(v)  =  0,  and  it  is  reasonable  to  compare  1<  with  X 

2  “  o 

in  a  neighborhood  of  v  «  H  W.  The  ratio  R  /X  can  also  be  interpreted  as  a 

2  ^  2°°  ^ 

bound  on  tho  relative  errors  U  (f)/jf(v)|  .  l’’or  there  uxist  f  for  which 

2  2  a  ^ 

R  (f)  >  U  ,  and  for  any  f  we  huvu  |f(v) |  <  X  since  X  is  the  right-hand  side 

°  2  *2  2 

of  (2.5).  lienee ,  there  are  always  f  for  which  U  (f)/|f(v)|  exceeds  R  /x  . 

2  2  “  o 

from  an  examination  of  tho  ratios  R  '/X  and  R  /X  we  draw  tho  following  oon- 

oo'  o  o 

elusions . 

Conclusion  )H.  When  the  data  is  sampled  at  tho  Nyquist  rate  every  linoar 
spectral  estimator  produces  mean  square  errors  having  the  samu  ordur  of  mag¬ 
nitude  as  thu  largest  produced  by  a  conventional  fourier  transform  with 
data  points,  oxoupt  near  v  ■  +  W  whuru  the  errors  have  thu  samu  order  of 

A 

magnitude  as  thouu  produced  by  thu  trivial  estimate  f(v)  L:  0. 

Conclusion  #2.  When  the  data  is  sampled  at  the  Nyquist  rate  every  linear 
spectral  estimator  produces  relative  mean  square  errors  which  aru  on  the 
ordur  of  1/N  near  v  U  and  unity  near  v  =  i-  W. 

0.  OVliRSAMPLING 

In  this  suction  we  consider  tho  effects  of  oversampling.  That  is, 
we  new  suppose  that  functions  of  class  II  (W)  are  sampled  at  a  rate  2kW,  k  >  1. 
The  derivations  of  (4.1)  and  (5.1)  require  the  closed-form  inversion  of  cer¬ 
tain  matrices,  which,  unfortunately,  wo  have  been  unable  to  effect  for  the 
case  of  oversampling.  Hence,  we  are  presently  unable  to  give  a  quantitative 
description  of  how  much  useful  information  is  contained  in  a  fixed  time 
window  when  the  data  rate  is  increased  without  bound.  However,  it  can  be 
shown  that  consistent  spectral  estimators  exist  only  if  both  the  data  rate 
2kW  and  the  length  T  of  the  time  window  are  allowed  to  increase  without 
bound.  (Cf.  the  discussion  in  Section  13.1  of  Blackman  and  Tukey  [4]  which 
suggests  the  existence  of  results  of  this  nature.)  Moreover,  because  of  the 
pulse-like  nature  of  the  functions  f  in  H (W) ,  it  is  also  necessary  to  prolong 


123 


innn'.j  '’"Wl>J»'.g' 


2 

the  time  window  in  both  directions ,  since  otherwise  will  not  converge  to 
its  infimum  as  'P-xo. 

These  results  can  be  proved  by  a  reduction  to  the  “previous  case"  of 
Nyquist  sampling.  For  by  Shannon's  Sampling  Theorem  every  finite, linear 
combination  of  values  f[n/(2kW}]  can  be  expressed  as  an  infinite  linear  com¬ 
bination  of  values  f[n/(2W)]. 

7.  DERIVATIONS 

i  2 

We  shall  now  write  ||f||  for  the  norm  (f) ,  and  we  let  (•'•)  denote 
the  corresponding  inner  product,  so  that 


(f,g) 


/ 

-w 


Df  ( v )  Dg(v) 


dv 


whore  D  denotes  the  differentiation  operator.  This  norm  is  chosen  because  it 
can  be  expressed  in  terms  of  physically  meaningful  parameters ,  and  because 

A 

the  maps  f>  f(t)  and  f->  f(v)  are  continuous  in  this  norm.  (This  last 

2 

property  is  not  enjoyed  by  the  usual  L  norm.)  Honce,  there  exist  functions 
K  53  K  (a)  and  e  -  e  (s)  in  H(W)  which  satisfy 

u  v  V 


f(t)  »  (f,Kt),  f(v)  =  (f,ev)  . 

Hilbert  function  spaces  II  for  which  the  maps  f->-  f(t)  are  continuous  are  called 
reproducing  kernel  hilbert  spaces,  and  the  function  K(s,t)  ~  Kt(s)  “  (Kt,Ks) 
is  called  the  reproducing  kernel.  When  II  =  H(W)  we  have 


K  (t) 
s 


4  71 


1  \  sin  j  2-irW  ( t—  s )  ] 

3  I  st(t-s) 


sin  (2'ivWs)  sin  (2T?Wt) 
2  2 

27fW  s  t 


,  (s,t,s-t^  0) 


\tt) 


,  3  3 
4  71  t 


27rWt  - 


sin  (2irWt) 
2?rWt 


(t  /  0) 


124 


K  (t)  =  K  (0)  *  — —  )  sin(2irWt)  -  (2irWt) cos  (2irWt)  *  ,  (t  /  0)  . 

0  1  4ir'V  (  ) 

Kq(0)  =  §  W3  .  (7.D 

To  derive  these  results  we  integrate  (f,K  )  by  parts  and  compare  the  results 

A  • 

to  (2.1).  It  is  easily  seen  that  K  (v)  is  the  (unique)  solution  to 

2.  C  *  A 

A 

D  K  (v)  ==  -  exp[2irivt]  whic^.  satisfies  the  boundary  conditions  K  (+W)  =  0. 

t  t.  “ 

In  a  similar  fashion,  one  establishes  that 


ev(t)  = 


4ir  t 


2,2  1 


2irivt 


cos  (2irWt)  -  (iv/W)  sin  (2irWt)  j  ,  (t  /  0)  . 


e  (0)  -  (1/2)  (W2  -v2) 


(7.2) 


|cj  |2  *  (W2  -v2V(2W)  . 


From  hilbert  space  generalities  we  get 


Q  =  sup 
f 


If (v)-f* (v) 


e  -  E3  K. 

1  v  n  t 


,  or. 


Q  =  <K§,g>  -  <§,J>  -  <J,|>  +  I  |ej  | 
where  the  matrix  K  =  K(t) ,  and  the  vector  J  =  J  (v,t)  ,  are  defined  by 

Knm  =  <Kt  '  Kt  }  =  Kt  (V 


(7.3) 


(7.4) 


m  n 


m 


J„  ■  <V  Kt  1  ■  W  (7-5) 

n 

The  results  (7,1)  -  (7.5)  can  now  be  substituted  into  (3.3),  and  It  is 

2 

easily  seen  that  the  minimization  of  Rq  with  respect  to  §  requires  the 
inversion  of  the  matrix  K  +  (o^/M^)!. 


125 


REFERENCES 


1.  A.  Aronszajn,  "Theory  of  Reproducing  Kernels",  Trans.  AMS  68_  (1950), 

p.  337-404.  . 

2.  J.  Barros-Neto,  "An  Introduction  to  the  Theory  of  Distributions", 
Marcel-Dekker ,  New  York  1973. 

3.  L.  Bers,  F.  John,  M.  Schecter,  "Partial  Differential  Equations", 
Interscience,  New  York,  1964. 

4.  R.  B.  Blackman  and  J.  W.  Tukey,  "The  Measurement  of  Power  Spectras" , 
Dover,  New  York,  1959. 

5.  J.  A.  Cadzow,  "Improved  Spectral  Estimation  from  Incomplete  Sampled 

Data  Observations",  RADC  Spectrum  Analysis  Workshop,  Rome  Air  Development 
Center,  Rome,  N.Y. ,  May  1978,  p.  85-96. 

6.  J.  A.  Cadzow,  "An  Extrapolated  Procedure  for  Band-Limited  Signals", 

IEEE  Trans,  on  Acoustics,  Speech,  and  Signal  Processing  ASSP-27  (1979), 
p.  4-12. 

7.  G.  W.  Deley,  "Waveform  Design",  in  Radar  Handbook  (M.  I.  Skolnik,  ed.), 
McGraw-Hill,  New  York,  1970. 

8.  Haber,  "Numerical  Estimation  of  Multiple  Integrals",  SIAM  Review  12_ 

(1970) ,  p.  481-526. 

9.  R.  S.  Palais,  "Foundations  of  Global  Non-Linear  Analysis" ,  Benjamin, 

New  York,  1968. 

10.  A.  Papoulis,  "Limits  on  Band-Limited  Signals",  Proc.  IEEE  55_  (1976), 
p.  1677-1686. 

11.  A.  Papoulis,  "A  New  Algorithm  in  Spectral  Analysis  and  Band-Limited 
Extrapolation" ,  IEEE  Trans,  on  Circuits  and  Systems  CAS-22  (1975), 
p.  735-742. 

12.  A.  Papoulis  and  C.  Chamzas,  "Adaptive  Extrapolation  and  Hidden 
Periodicities" ,  RADC  Spectrum  Analysis  Workshop,  Rome  Air  Development 
Center,  Rome,  N.Y.,  May  1978,  p.  85-96. 


126 


COMPENSATION  OF  AUTOREGRESSIVE  SPECTRAL  ESTIMATES 
FOR  THE  PRESENCE  OF  WHITE  OBSERVATION  NOISE 


STEVEN  KAY 


Raytheon  Company 
Submarine  Signal  Division 
Portsmouth,  RI  02871 


Abstract 

The  autoregressive  spectral  estimator  possesses  excellent  resolution  properties  for 
time  series  which  satisfy  the  "all-pole"  assumption.  When  noise  is  added  to  the  time 
series  under  analysis,  the  resolution  of  the  spectral  estimator  degrades  rapidly.  The 
usual  approach  to  this  problem  is  to  model  the  resulting  time  series  by  the  more  appro¬ 
priate  autoregressive-moving  average  process  and  to  use  standard  time  series  analysis 
techniques  to  identify  the  autoregressive  parameters.  This  standard  technique,  how¬ 
ever  does  not  result  in  a  positive-definite  autocorrelation  matrix.  Thus,  the 
resulting  spectral  estimator  may  exhibit  a  large  increase  in  variance.  An  alternative 
approach,  termed  the  noise  compensation  technique,  is  proposed.  It  attempts  to  cor¬ 
rect  the  estimated  reflection  coefficients  for  the  effect  of  white  noise  assuming  the 
noise  variance  is  known.  Simulation  results  indicate  that  a  significant  decrease  in  the 
degrading  effects  of  noise  may  be  effected  using  the  noise  compensation  technique. 

I,  Introduction 

Autoregressive  (AR)  spectral  estimation  has  received  much  attention  lately  in  many 
diverse  fields.  Although  based  upon  different  theoretical  foundations,  Maximum  Entropy 
Spectral  Estimation,  [1]  used  in  seismic  signal  processing,  and  Linear  Spectral  Predic¬ 
tion,  [2]  used  in  speech  signal  processing,  ate  in  practice  identical  to  AR  spectral 
estimation.  The  principal  advantage  of  the  AR  estimate  over  conventional  Fourier- 
based  spectral  estimators  is  its  enhanced  resolution  properties.  [3]  However,  it  has 
been  shown  that  much  of  this  increased  resolution  is  lost  when  observation  noise  is 
added  to  the  AR  time  series.  [4]  The  reason  for  the  degradation  of  the  spectral  esti¬ 
mate  in  the  presence  of  noise  is  that  the  AR  assumption,  i.e. ,  that  the  time  series  can 
be  represented  as  the  output  of  an  all-pole  filter  excited  by  white  noise  is  no  longer 
valid.  [5]  Thus,  the  lower  the  signal-to-noise  ratio  (SNR),  the  more  the  "all-pole" 
assumption  is  violated,  and  the  poorer  the  spectral  estimate  obtained. 

The  usual  approach  to  this  problem  is  to  model  the  noise  corrupted  time  series  by 
the  appropriate  autoregressive-moving  average  (ARMA)  process  and  to  use  standard 
time  series  analysis  techniques  to  identify  the  autogressive  parameters.  [6],  [7] 


127 


This  approach  estimates  the  AR  parameters  ja^,  k  =  1,  2,  .  .  .  ,  p}  as  the  solution 
of  the  equations: 

Ry(P)  Ry(P  -  i)  ...  Ry(D 

Ry(p  +  1)  &y(p)  .  .  .  Ry{2) 

Ry(2p  -  1)  Ry(2p  -  2)  .  .  .  Ry(p) 

where  Ry(k)  is  the  autocorrelation  function  estimate  of  the  ARMA  (p,p)  process,  Yj.. 
This  standard  technique,  however,  does  not  result  in  a  positive-definite  autocorrelation 
matrix.  Thus  the  spectral  estimator  may  exhibit  a  large  increase  in  variance.  [8] 

An  alternative  method,  termed  the  noise  compensation  technique,  is  proposed.  It 
attempts  to  correct  or  compensate  the  estimated  reflection  coefficients  for  the  effects 
of  the  white  observation  noise.  Via  this  method  the  autocorrelation  matrix  may  be 
easily  checked  for  positive-definiteness  by  assuring  I  K.c I  <  1^,  i  =  1,  2,  .  .  .  ,  p, 
where  K.c  denotes  the  noise  compensated  reflection  coefficient  estimate. 

II.  Noise  Compensation  Technique 
The  noise  compensation  technique  is  now  derived. 

Assume  that  we  are  given  a  data  record  |Yt>  t  =  1,  2,  .  .  .  ,  N }  where  Yt  =  Xt  +  Wt. 
is  an  A R  process  of  order  p  and  Wt  is  white  noise  with  variance  <r^..  The  Burg 
estimate  of  K.,  the  reflection  coefficient,  ^  is 


V 

Fa  n 

Ry(P  +  1) 

s. 

Ry(p  +  2) 

2 

• 

* 

• 

& 

Rv(2p) 

_  P_ 

Y 

(1) 


N-i 


t=l 


0-1)  «-!) 
t  +  1  t 


i>i  _  g  g- 

r  (i-1)  ,  u(i-l) 


£ 

t-1 


e,  .  '  +  b' 

t+i  t 


where 


(i-D 
"  t+i 


i-1 


k-l 


A(i-D  y 
k  t+i-k 


*r-v 


i-1 


k-l 


Y 

k  t»k 


(2a) 


(2b) 


128 


Since  K.  will  be  biased  due  to  the  noise,  one  is  more  interested  in  obtaining  an  estimate 
of: 

-E  * 

t+i  t 


‘  ^4rT4r,2]l 

where  *  indicates  the  value  of  the  quantity  for 


Y  =X  ,  i.e.,  no  observation  noise  present, 
t  t 


If  we  let  A.  X(Z)  -  1  +  ^  Y 


we  have 


Assuming  that  the  W^_  process  is  uncorrelated  with  the  process  and  E(W^)  =  E(XJ  =  0, 
it  can  be  shown  that 

E[C1,Y-w[-|Vi'1)VEK-1)2] 

r  (i-i)  (i-i)-i  _  r  (i-i>  (i-i)]  *  2  A(i-i)  A(i-i) 

L  t  •  i  t  J  L  t+i  t  J  W  "  k  i-k 

Therefore, 


f  (i-1)  (i-l)-J  2  A(i-l)  A(i- 

EL  t+i  t  J"wM  aH 

K“1 


2  V* 

k  i-k 


J  eTo^ 

2  t'i 


Ad-D 

k 


In  order  to  relate  Kj  to  the  Burg  estimate,  we  take  as  estimators  of  the  expected 
values  the  following: 


1 


N-i 


e(i-Db(i-D 

N-i-(i-l)  ^  t+i  t 


E 


i 


N-i-(i-l) 


N-i  ,.2 

E  *r1) 

t=l 


Note  that  the  estimates  have  accounted  for  the  (i-1)  degrees  of  freedom  lost  in  esti- 

(i-1)  u(i”l) 


mating ja£  which  are  needed  to  generate  e^.  ,  b 


Then, 


v. 


A 

K.  = 

l 


-  1  y  e{i_1)  b(i_1) 

N-i-(i-l)  t+i  t 

t—  X 


N-‘  (i-D21 


N-i  2  ,  N-i  , 

_i _ v''  (i-i)  ,  1  y  bc 

N-i-(i-l )£-*  6 t+i  +  N-i-(i-l)  t 

t —  1  V  x 


A  c 

Letting K.  be  our  compensated  estimate,  we  have. 


A  C 

K.  = 


1  y  e(i”1)b(i_1) 

N-i-(i-l)  t+i  t 


-^2E  tf"1)a?"1) 

k=l  k 


W 


i-k 


i  N-i 
N-i-(i-l)  E 


;(i-D 

t+i 


N-i-(i-l) 


N-i 


t=l 


(i-i: 


.  2i 


'W 


i-1  .  2 

1  *  £  * ’  ’ 


130 


.'jajg"1*— **m«.iihi  iitrig,..  r<r“i->f  fin 


«i m 


or  finally, 


2  ^  A  (i-l)A(i-l) 


£c  = _ t=l _ 1 

i  N-i  ..2  ..  1s2i  9 

£  °M  +bf'  -2<N-21-'  >>% 

t  1 


i-1 

>4£ 

k=l 


i-1  2 • 

z  r11 


2  2 

To  be  more  realistic,  we  replace  o-  by  o.  a-„,  where  0  <  a.  <  1.  Thus, 

W  i  W  l 


2  £  "w1  "f11  -2(N-2i  +  D«,V  £ 


A  C  t=l 

K"  — 

i  ’  N-i  r  2  2 


2  a  (i-1)  a(1-1) 

V  Ls  \  Vk 
k=l 


V  e 1-1  +b(l 

t k  t4i  1 


-2(N-2i  +1)  0.  „• 
l  W 


14  £ 


£  \(l'1)2 

k=  i  k 


It  is  seen  that  K.G  is  not  constrained  to  be  between  ~1  and  +1.  To  maintain  this 

i 

range,  we  will  need  to  choose  jo-  ,  j  carefully.  In  the  simulation  examples  to  follow, 
we  choose  o,  as: 

l 

V - ft -  (4) 

Where0<l'MIN<“MAX<1- 

Thus,  a.  decreases  linearly  with  i,  which  reflects  our  confidence  in  the  estimates  of  the 
higher  order  reflection  coefficients. 

To  demonstrate  the  capability  of  the  noise  compensation  technique  several  simula¬ 
tion  examples  are  given.  The  first  example  utilizes  data  composed  of  an  AR(4) 
process  [9]  with  power  spectral  density  given  in  Figure  1  and  white  Gaussian  noise. 

The  SNR  is  15  dB.  100  realizations  of  the  conventional  ARMA  approach,  which  uses  (1)  with 
the  autocorrelation  estimate 


£  Yt  Yt + k  • 


are  shown  in  Figure  2.  In  Figure  3  the  noise  compensated  estimates  are  plotted. 

Note  that  p  =  8  was  used  since  p  =  4  did  not  result  in  adequate  resolution  of  the  spectral 
peaks.  Comparing  the  figures,  we  see  that  not  only  is  the  variance  of  the  noise  com¬ 
pensated  spectral  estimator  less  than  that  of  the  ARMA  approach  but  the  spectral 
peaks  are  clearly  visible  whereas  in  the  ARMA  case  only  one  peak  is  seen. 


/I  ■. 


131 


!  • 

Finally,  a  simulation  was  conducted  for  two  equi-amplf  ude  sinusoids  in  white 
Gaussian  noise.  The  peak  of  the  spectral  estimate  was  normalized  to  0  dB  and  the 
true  spectral  lines  ure  indicated  by  arrows.  The  results  are  shown  in  Figures  4  and  5, 
A  large  improvement  is  noted. 

Ill,  Conclusions 

The  noise  compensation  technique  described  in  this  paper  offers  an  alternative  and 
possibly  better  method  than  the  conventional  ARM  A  approach  of  reducing  the  offoots  of 
white  observation  noise  on  the  autoregressive  spectral  estimator.  When  used  properly, 
the  largo  bias  error  introduced  by  the  nuiso  can  bo  significantly  reduced, 

Reforonoos 

1.  Burg,  J.  P. ,  1975,  "Maximum  Entropy  Spectral  Analysis'1,  Ph.  D.  dissertation, 
Stanford  Univ. 

2.  Makhoul,  J. ,  April  1975,  "Linear  Prediction;  a  Tutorial  Review",  Proc.  IEEE, 
Vol.  63,  pp  561-580. 

3.  Lacoss,  R.T. ,  August  1971,  "Data  Adaptive  Spootral  Analysis  Methods",  Geo¬ 
physics,  Vol.  36,  pp  661-675. 

4.  Marple,  S.  L. ,  Jr.,  1977,  "Resolution  of  Conventional  Fourier,  Autoregressive. 
and  Special  ARMA  Methods  of  Spectral  Analysis",  IEEE  Int.  Conf.  on  ASSP, 
Hartford,  ON, 

6.  Kay,  S.M, ,  "The  Effects  of  Noise  on  the  Autoregressive  Spectral  Estimator",  to 
bo  published  in  the  IEEE  Trans,  on  Acoustics,  Speech,  and  Signal  Processing, 


6.  Gcrsch,  W. ,  October  1970,  "  Estimation  of  the  Autoregressive  Parameters  ol‘  a 
Mixed  Autoregressive  Moving-Average  Tlmo  Series",  IEEE  Trans,  on  Automatic 
Control,  pp  583-588. 

7.  Box,  G.  E.  1\ ,  Jenkins,  G.  M. ,  1970,  Time  Sorlos  Analysis— Forecasting  and 
Control,  IIolden-Day,  San  Francisco,  CA. 

8.  Kay,  S.M. ,  "Noise  Compensation  l'or  Autoregressive  Spectral  Estimates",  sub¬ 
mitted  to  the  IEEE  Trans,  on  Acoustics,  Spcooh,  and  Signal  Processing. 

9.  Ulrych,  T.  J. ,  Bishop,  T.N. ,  February  1975,  "  Maximum  Entropy  Spectral  Analysis 
and  Autoregressive  Decomposition",  Reviews  of  Geophysics  and  Space  Physics, 

Vol.  13,  pp  183-200. 


ORDER  DETERMINATION  FOR  AUTOREGRESSIVE  SPECTRAL  ESTIMATION 


M.  KAVEH  AND  S.  P.  BRUZZONE 


Department  of  Electrical  Engineering 
University  of  Minnesota 
Minneapolis,  MN  55455 


Abstract'' 

A  method  related  to  Akaike's  Information  Criterion  for  determining  the 
order  of  an  autoregressive  (AR)  spectral  estimator  is  discussed.  Examples 
are  shown  that  compare  the  performance  of  this  method  with  that  of  the  well- 
known  AIC. 


Introduction 


The  trade-off  between  the  bias  (or  resolution)  and  variance  of  a  spec¬ 
tral  estimator  is  the  central  issue  in  spectral  estimation  by  any  method. 

For  the  traditional  (Blackman  and  Tukey  type)  spectral  estimators,  this 
trade-off  is  reflected  in  the  choice  of  the  spectral  window  type  and  the 
maximum  lag  of  autocorrelation  function  used.  This  subject,  referred  Co  as 
window  carpentering,  is  discussed  in  detail  by  Jenkins  and  Watts  [1] ,  and  is 
straightforward  because  resolution  is  well-defined  in  terms  of  the  spectral 
window  bandwidth. 

With  the  popularity  of  data  adaptive  (notably  the  autoregressive  (AR)) 
spectral  estimation  methods,  similar  resolution-variance  trade-offs  are  in 
order.  Specifically,  well-defined  methods  are  needed  to  determine  the  order 
of  the  (AR)  spectral  estimator  for  a  given  data  sample.  Furthermore  for 
practical  applications,  these  methods  need  to  be  on-line  and  as  much  as 
possible  objective  in  nature.  This  problem  is  complicated,  however,  due  to 
the  data  dependent  nature  of  the  resolution  of  the  AR  spectral  estimator 
(e.g.,  no  well-defined  window  bandwidth).  Therefore,  the  question  of  order 
determination  for  the  spectral  estimator  seems  to  be  best  posed  as  a  proce¬ 
dure  for  obtaining  a  compromise  between  the  AR  model  fit  and  the  variance  of 
the  estimated  AR  parameters  as  a  function  of  the  model  order. 

Akaike  [2,  3]  and  Parzen  [5]  have  recently  introduced  some  methods  for 
automatic  determination  of  orders  of  autoregressive  processes.  One  method, 
based  on  Akaike's  Information  Criterion  (AIC),  has  gained  special  popularity. 
In  this  paper,  we  follow  the  derivations  on  which  AIC  is  based,  introduce 
appropriate  modifications  to  account  for  practical  estimation  procedures  and 


derive  a  modified  information  criterion  designated  the  Conditional  AIC 
(CA1C) .  Finally  we  present  the  results  of  a  number  of  numerical  simulations 
that  compare  the  performances  of  AIC  and  CAIC  for  spectral  estimation. 

Akaike 's  Information  Criterion 

Akaike  derived  his  information  criterion,  AIC,  as  an  estimate  of  the 
asymptotic  relative  goodness  of  fit  of  the  model  to  the  observation.  Although 
his  derivations  were  based  on  information  theoretic  arguments,  the  resulting 
parameters  were  the  same  as  the  maximum  likelihood  estimates.  In  this  section 
we  review  the  steps  involved  in  obtaining  AIC  [3]  as  they  pertain  to  the 
derivation  of  the  new  criterion.  We  assume  the  time  series  to  be  described  by 

L 

Xj.  =  £  a.xt_^  +  ut,  t=0,...N  (1) 

i=l 


where  ut  is  zero-mean  white  and  Gaussian  and  L  is  to  be  determined.  Through 
asymptotic  arguments,  Akaike  defines  an  information  criterion,  related  to  the 
maximum  likelihood  of  the  estimates  of  a^,  a^,  as: 

AIC(A)  ■  (-2) In  (maximum  likelihood) 

+  E.N  •  | | A  -  A||2  (2) 

A 

where  denotes  asymptotic  expectation,  A  and  A  are  L  x  1  vectors  of  the 
coefficients  a.  and  their  estimates  a^.  The  practical  AIC  which  is  related 
to  the  full-information  likelihood  function  of  a  Gaussian  process  is  then 
given  by 

AIC(L)  «*  N  In  (MLE  of  innovation  variance)  +  2L  (3) 

and  the  order  L  is  chosen  that  minimizes  A1C(L) . 

The  New  Criterion 

Since  the  exact  maximum  likelihood  (full  information  maximum-likelihood) 
estimates  are  generally  not  available,  the  conditional  MLE  one  based  on  Yule- 
Walker  equations  or  Burg's  algorithm,  of  the  innovation  variance  are  normally 
used  in  (3).  We  propose  using  the  conditional  maximum  likelihood  (CML)  func¬ 
tion  in  (2).  This  function  is  based  exactly  on  the  available  data  and  we 
believe  is  a  more  sensitive  indicator  of  the  behavior  of  the  estimates  used 
in  practice.  Thus,  in  the  following  the  CML  estimate  of  A  and  its  covariance 


function  are  considered,  in  order  to  obtain  tractable  expressions  for  (2). 

The  conditional  (partial  information)  likelihood  function  for  the  time 
series  in  (1)  is  given  by: 


L(A,c?u  .  .  ,x^) 


(2tt0  2) 
u 


T 

1  -CDC 

N-L  eXP  7~2 
— 7T-  2a 

o  2  u  J 


where  is  the  variance  of  the  innovation  sequence  u t. 


£  “  [l,-a^,-a2 . -a^]  anc* 

N 

Dij  "  V  "  k,L+1  Xk~i+1  Xk-j+1  * 


Furthermore,  the  CML  estimation  of  is  given  by: 

au2  -  CTDC/(N-L)  (6) 

and  a  lower  bound  for  the  variance  of  the  estimates  of  follows  from  the 
Fisher's  information  matrix  to  be  [6] 

var[ai]  i  nZl  °u2-ii  (7) 

where  is  the  diagonal  element  of  the  inverse  of  the  (L  +  1) -sample  covar¬ 
iance  matrix  of  xfc.  It  can  also  be  shown  [6]  that  for  an  AR  model 

o  2h±i  =*  1,  thus 
u  — 


vart8i)  i  (Pl 


141 


We  now  proceed  to  define  an  expression  for  (2)  based  on  the  CML  estimates  of 
A  and  ou^.  The  expression  for  the  CML  given  in  (A)  is  now  substituted  in  (2) 
for  the  maximum  likelihood  and  using  (8)  for  the  second  term  in  (2)  and 
(N  -  L)  for  N  we  have: 

CAIC(L)  =  (N  -  L)ln(2TTOu2)  +  (a  -  1)L  (9) 

The  factor  a  >  1  is  included  to  account  for  the  asymptotic  nature  of  the 
criterion  and  the  fact  that  (8)  is  a  lower  bound  for  the  variance  of  a..  A 
similar  parameter  was  also  suggested  for  AIC  [7]  and  in  [3]  Akaike  discusses 
a  possible  approach  for  choosing  a.  Since  CAIC(L)  as  given  by  (9)  is  depen¬ 
dent  on  the  variance  of  xt,  the  test  is  standardized  by  introducing  a  normal¬ 
ized  innovation  variance  so  that 


CAIC(L)  =  (N  -  L)ln[au2/(var  xt>] 
+  (a  -  1) L 


(10) 


Thus  CAIC(O)  =  0.  The  factor  a  is  chosen  to  give  more  or  less  weight  to  the 
error  in  the  estimation  of  the  parameters.  In  other  words,  resolution  can  be 
increased  at  the  expense  of  the  variance  of  the  estimates  by  decreasing  a. 

We  have  found,  empirically,  values  of  3.5-4  to  give  the  most  stable  and 
reasonable  indication  of  the  order. 


Simulation  Results 


We  have  tested  the  performance  of  CAIC  relative  to  AIC  on  a  number  of 
time  series  models  reported  previously.  The  data  included  normal  as  well  as 
uniform  distributions.  The  estimates  were  based  on  CML  (least-square)  and 
Yule-Walker  methods.  In  the  great  majority  of  cases,  CAIC  performed  as  well 
or  superior  to  AIC.  Examples  of  these  can  be  found  in  [8].  Some  estimated 
spectra  based  on  orders  determined  by  AIC  and  CAIC  are  also  shown  in  Figures 
1-3.  Yule-Walker  equations  with  autocorrelation  function  estimates  given  by 


i  "-1 

ri  '  5  £  Yj+i 


were  used.  The  example  shown  in  Figure  1  indicates  the  relative  stability  of 
CAIC.  Figure  2  shows  that  the  model  order  chosen  by  AIC  results  in  spurious 
peaks,  while  giving  higher  peak  resolution  than  the  CAIC  based  one.  Figure 
3  shows  that  an  increase  in  white  noise  level  increased  the  AIC  order  to  the 
point  that  spurious  spectral  peaks  became  pronounced  while  CAIC  remained 
nearly  the  same,  showing  the  relative  stability  of  CAIC. 


142 


Conclusions 


Order  determination  for  the  AR  spectral  estimator  was  discussed.  A  new 
order  indicator  was  introduced  that  is  closely  related  to  the  AIC  method  of 
order  determination  for  AR  processes.  The  new  criterion,  CAIC,  fits  the 
practical  estimation  modes  more  closely  and  was  found  to  be  a  relatively  sta¬ 
ble  indicator  of  the  order  which  trades  off  resolution  and  variance  of  the 
estimates. 

References 


1.  Jenkins,  G.  M. ,  Watts,  D.  G.,  1968,  "Spectral  Analysis  and  its  Applica¬ 
tions"  ,  San  Francisco,  Holden-Day. 

2.  Akaike,  H. ,  1970,  "Statistical  Predictor  Identification",  Annals  of  Inst. 
Stat.  Math,  Vol.  22,  No.  2. 

3.  Akaike,  H. ,  1974,  "A  New  Look  at  Statistical  Model  Identification",  IEEE 
Trans,  on  Automatic  Control,  Vol.  AC-19,  No.  6,  December. 

4.  Akaike,  H. ,  1977,  "A  Bayesian  Extension  of  the  Minimum  AIC  Procedure  of 
Autoregressive  Model  Fitting",  Research  Memo  No.  126,  The  Institute  of 
Statistical  Mathematics,  November. 

5.  Parzen,  E. ,  1975,  "Multiple  Time  Series;  Determining  the  Order  of 
Approximating  Autoregressive  Schemes",  Tech.  Report  No.  23,  July,  CUNY 
Buffalo  Dept.  Computer  Science. 

6.  Kaveh,  M. ,  to  be  published,  "Order  Determination  for  Least-Squares 
Predictor  Identification" . 

7.  Bhansali,  R.  J.,  Downham,  D.  Y.,  1977,  "Some  Properties  of  the  Order  of 
an  Autoregressive  Model  Selected  by  a  Generalization  of  Akaike* s  FPE 
Criterion",  Biometrika,  Vol.  64. 

8.  Kaveh,  M. ,  1579,  "A  Modified  Akaike  Information  Criterion",  Proceedings 
of  the  17th  CDC,  January. 


RR  (CflIC) 
RR  (RIC) 


FREQUENCY  (HZ) 

Fig.  1(a) 


RR  (CRIC)  : 
RR  (RIC)  = 


0.0000 

.1000 

.2000  .3000 

FREQUENCY  (HZ) 

Fig.  1(b) 

FIGURE  1.  Power  spectrum  of  xt=.4xt 

a) 

N=100 ,  CAIC=2 ,  AIC=2 , 

b) 

N=500 ,  CAIC=2 ,  AIC=8 . 

J-',.  JQ  ,  H,  \\ 


POWER  SPECTRUM  POWER  SPECTRUM 

(LOCO  4.000  -*,000  0.000  4.000 


+n  a  2=. 15 ;  N=100;  CAIC=6,  AIOIO. 
t  n  % 


FIGURE  3.  Log  Power  spectrum  of  xt=sln( „ &Trt+30°)+0 . 707sin( 2uf:+&0°) 


DIFFICULTIES  PRESENT  IN  ALGORITHMS  FOR  DETERMINING  THE  RANK 
AND  PROPER  POLES  WITH  PRONY'S  METHOD 


MICHAEL  L.  VAN  BLARICUM 


Effects  Technology,  Inc, 
5383  Hollister  Avenue 
Santa  Barbara,  California  93111 


Introduction 

This  author  presented  a  paper  [1]  at  the  1978  RADC  Spectral  Estimation 
Workshop  which  reviewed  the  algorithm  of  Prony,  This  Prony  algorithm  is  used 
to  extract  a  system’s  natural  resonances  and  associated  residues  from  tran¬ 
sient  response  data.  This  procedure  has  great  potential  use  in  the  analysis 
of  transient  electromagnetic  response  duta  such  as  those  from  EMC  and  EMP 
testing  and  from  transient  radar  scattering. 

Last  year’",  paper  uddressed  the  algorithm  and  discussed  the  problem  of 
noise  and  its  effect  on  Prony 's  method  with  several  solution  methods  being 
suggested.  In  addition  a  few  examples  were  given.  This  paper  will  focus  on 
the  specific  problem  of  the  determination  of  the  order  of  the  system  being 
analyzed.  That  is,  it  will  address  the  question:  How  do  you  a  priori 
determine  how  many  poles  arc  contained  in  the  response  dataV 

The  process  of  extracting  the  natural  resonances  and  their  associated 
residues  from  a  transient  signal  has  four  main  steps  as  shown  in  Figure  1. 

The  derails  of  these  steps  were  discussed  in  reference  [1],  A  brief  review 
is  given  here. 

The  first  step  is  the  determination  of  the  order  of  the  system.  At  this 
step  one  decides  how  many  poles  the  system  response  function  has  so  Lhat  the 
proper  model  order  can  be  obtained.  It  has  been  found,  through  trial  and  er¬ 
ror,  that  if  the  order  of  the  system  is  underestimated  then  the  extracted 
poles  will  deviate  substantially  from  the  true  poles.  Similarly  if  the  order 
of  the  system  is  overdetermined  the  algorithm  produces  extraneous  poles.  The 
presence  of  the  extraneous  poles  causes  the  residues  of  the  true  poles  to  be 
inaccurate  and  also  results  in  unnecessary  computation  time.  In  addition,  as 
will  be  shown,  attempting  to  solve  an  overdetermined  system  will  result  in  an 
ill  conditioned  matrix  and  present  numerical  problems.  The  presence  of  noise 
in  data  makes  the  determination  of  the  system  order  a  very  complex  problem. 


147 


Once  the  order  of  the  system  has  been  determined  the  coefficients  of 
Prony's  difference  equation  must  be  solved.  This  step  basically  involves  the 
solution  of  a  linear  matrix  equation.  The  degree  of  difficulty  of  this  step 
depends  on  the  uo’.se  level  in  the  data  and  on  the  proper  determination  of  the 
order  of  the  system.  Generally  this  step  is  solved  using  a  least  squares 
method,  however  other  solutions,  such  as  recursive  techniques,  may  be  more 
applicable  depending  on  the  condition  of  the  data. 

Once  the  difference  equation  coefficients  are  obtained  the  roots  of  an 
Nth  order  polynomial,  N  being  the  system  order,  must  be  found.  Many  root 
finding  routines  exist  but  Muller's  method  [2]  appears  to  be  the  optimal 
method.  While  this  is  a  key  step  in  the  procedure  it  is  totully  dependent  on 
the  accuracy  of  the  coefficients  which  were  obtained  in  the  previous  step. 

The  final  step  is  the  solution  of  the  residues  which  are  associated  with 
the  system  poles  or  singularities.  These  residues  are  obtained  by  solving  u 
simple  linear  matrix  equation.  Generally  a  least  squared  error  (Lo  norm) 
solution  is  used.  However,  Sehaubert  12]  has  shown  that  if  the  uniform  norm 
(lJoo  norm)  solution  is  found  the  accuracy  of  the  residues  is  much  better.  The 
norm  solution  docs  require  the  solution  of  a  nonlinear  problem  and  hence 
requires  more  computation  time.  In  many  problems,  such  us  target  identifica¬ 
tion,  the  residues  are  not  even  required  and  hence  this  is  certainly  not  a 
criticul  step. 

It  is  this  author's  opinion  that  the  first  step,  the  determination  of  the 
system  order,  is  the  key  step  in  the  total  procedure.  Up  to  this  time  many 
methods  have  been  used  to  dot, ermine  the  order  but  they  either  break  down  when 
noise  is  present  or  they  are  dependent  on  trial  and  error  or  the  intervention 
of  the  user.  For  analysis  of  massive  amounts  of  data,  as  in  the  case  of  EMP 
data,  or  for  radar  target  Identification,  a  totally  automated  method  is  a 
must.  The  remainder  of  this  paper  will  discuss  the  problem  of  order  determi¬ 
nation.  Some  specific  techniques  will  be  discussed  and  numerical  examples 
will  be  presented.  Finally  specific  recommendations  as  to  the  direction  of 
future  research  in  this  area  will  be  made. 


Mathematical  Preliminaries 

In  order  to  discuss  the  different  methods  for  determining  the  order  of  a 
system  response  before  attempting  to  extract  the  resonances,  it  is  necessary 
to  review  some  of  the  details  of  the  Prony  algorithm.  The  details  of  Prony's 
method  can  be  found  in  [1],  The  notation  used  in  [1]  will  be  followed  here 
for  easy  reference. 


.M-l 


(1) 


N  S  nAt 

R(t  )  -  R  -  V  A.o  1  ,11-0,1, 

1-1 

where  R(tn)  is  the  system  response,  the  are  the  complex  poles  and  the 
ore  the  corresponding  residues.  Expression  (1)  is  written  in  discrete  data 
form  where  it  has  been  assumed  that  the  M  samples  are  taken  at  equal  time 
increments  At.  Equation  (1)  is  M  nonlinear  equations  in  2N  unknown. 

The  solution  of  the  and  the  A^  in  (1)  is  based  on  the  fact  that  the 
Rn  must  satisfy  a  difference  equation  of  order  N  which  may  be  written  as 

N 

y  UpRp+k  -  o,  k  -  0,  1 .  ,  Y-l  ,  (2) 

l>-0 

where  y  is  the  value  of  M-N.  This  difference  equation  is  referred  to  as 
Prony's  difference  equation.  Equation  (2)  is  usually  rewritten  by  defining 
aN  equal  to  1  so  that  the  equation  has  the  form 

^  °pRp+k  "  “  l<N-k  * 

p-0 

If  2N  data  samples  are  used,  then  equation  (3)  can  be  solved  exactly  for  the 
a’s.  If  more  than  2N  samples  are  desired,  then  one  can  use  a  least  squared 
error  fit  to  (3).  It  is  the  solution  of  this  equation  which  is  the  second 
step  of  Figure  1  and  which  yields  the  coefficients,  a.>,  for  the  polynomial  of 
the  third  step.  For  convenience  equations  (2)  and  (3)  can  be  rewritten  in 
matrix  notation  as 


.  A  x  -  b  .  (4) 

The  matrix  A  is  filled  with  the  discrete  response  samples  Rp^  and  is  either 
of  dimension  y  by  N+l  for  equation  (2)  or  y  by  N  for  equation  (3).  For  equa¬ 
tion  (2)  the  vector  x  is  of  length  N+l  and  contains  the  unknown  coefficients 
a  and  the  vector  b  is  equal  to  zero.  For  equation  (3)  the  vector  x  is  of 
length  N  and  the  vector  b  is  of  length  y. 

In  order  to  obtain  a  proper  solution  to  either  equation  (2)  or  (3)  it  is 
necessary  to  know  the  value  of  N.  This  is  equivalent  to  saying  that  for  the 
matrix  problem  defined  in  (4)  we  need  to  know  the  rank  of  the  matrix  A.  If 
N  is  picked  too  small  then  the  matrix  equation  is  underdetermined  and  will 
give  wrong  answers.  If  N  is  picked  too  large  then  the  matrix  A  is  rank 


149 


deficient  and  will  likely  cause  the  matrix  to  be  ill  conditioned. 


The  Eigenvalue  Method 

This  author  [4]  has  presented  a  method  for  determining  the  rank  of  the 
matrix  A  based  on  its  eigenvalues.  The  details  can  be  found  in  reference 
[4].  The  basic  theory  is  that  the  response,  Rn,  of  a  system  which  contains 
exactly  N  resonances  will  satisfy  the  difference  equation  (2)  exactly. 
Another  way  of  looking  at  the  problem  is  that  there  are  exactly  N  mode 
vectors  for  an  order  system  where  the  mode  vector  X-i  is  defined  as 

r  *  o  i 


S  At 

where  Z-j  =  e  ^  .  It  can  be  shown  that  the  matrix  A  is  made  up  of  a  linear 
combination  of  these  mode  vectors.  For  the  least  squares  or  pseudo-inverse 
solution  a  square,  (N+l)  by  (N+l)  for  equation  (2),  matrix  <I>  is  defined  as 

<I>  ■»  A  A 


This  square  matrix  4>  will  have  N+l  eigenvectors  and  associated  eigenvalues. 

The  N  mode  vectors  defined  by  (5)  will  be  linearly  independent  and  will  have 
projections  on  N  of  the  eigenvectors  of  the  system.  There  will  be  one 
eigenvector  which  is  orthogonal  to  the  N  mode  vectors  und  its  eigenvalue  will 
be  zero.  Hence,  the  process  for  determining  the  order  of  the  system  is  to 
fill  matrix  <I>  to  some  dimension  M  by  M.  The  corresponding  M  eigenvalues  of 
the  system  are  found  and  checked  to  see  if  one  or  more  are  equal  to  zero. 

If  there  are  L  eigenvalues  equal  to  zero,  then  N  would  be  equal  to  M-L.  If 
L  is  not  equal  to  one,  then  the  matrix  <i>  is  recomputed  to  order  N+l  by  N+l, 
and  the  eigenvalues  are  regenerated.  This  final  step  is  done  because  the 
eigenvector  corresponding  to  the  zero  eigenvalue  is  the  vector  of  coefficients 
of  the  difference  equation. 


This  procedure  works  fine  as  long  as  the  system  response  data  does  not 
contain  noise.  With  noisy  data  the  theory  starts  to  fail  because  the  system 
is  no  longer  exactly  the  sum  of  N  exponetials.  Reference  [4]  shows  that  for 
small  levels  of  noise  with  variance  the  N+l  eigenvalue  for  the  4> matrix 
should  be  equal  to  ycr^  instead  of  zero. 


H  T  f  »  if 


Eigenvalue  Examples  and  Difficulties 

Figure  2  shows  an  example  of  the  thirteen  eigenvalues  for  a  twelve  pole 
system.  Note  that  for  the  no  noise  case,  the  thirteenth  eigenvalue  is  less 
than  lO”-^  or  practically  zero.  For  the  noise  case  the  thirteenth  eigenvalue 
is  equal  to  about  3  x  10-3.  the  value  of  yo2  for  this  case  is  2.5  x  10“3. 
From  the  example  of  Figure  2  it  appears  that  there  should  be  no  difficulty 
in  deciding  what  the  proper  order  of  the  system  is  even  in  noisy  data.  This 
is  theoretically  the  case.  However  the  following  example  shows  some  inherent 
difficulties . 

Table  1  presents  the  results  of  twenty  Monte  Carlo  trials  performed  for 
seven  different  sets  of  noisy  data.  The  signal  to  noise  ratio  ranged  from 
3 . 3dl3  to  35.4dB.  Note  that  while  the  predicted  value  of  the  N+l  eigenvalue 
was  very  close  to  the  mean  of  the  actual  numerically  determined  value,  the 
standard  deviation  of  the  poles  is  very  large  when  the  noise  level  is  high. 
Hence,  while  the  proper  order  was  determined,  the  poles  resulting  from  this 
model  order  were  in  error.  Experience  has  shown  that  picking  a  higher  order 
model  will  give  better  accuracy  to  the  resulting  true  poles.  However  there 
will  he  extraneous  poles  also  present  which,  if  nothing  is  known  about  the 
system,  will  potentially  be  hard  to  distinguish  from  the  true  poles.  Hence 
while  the  proper  rank  of  the  $  matrix  has  been  found  the  resulting  answer 
will  not  necessarily  be  that  which  is  sought.  In  addition  further  examples 
have  shown  tl'at  as  the  noise  level  increases  it  is  difficult  to  determine 
what  is  the  exact  cutoff  point.  That  is,  the  difference  between  the  N  and 
the  N+l  eigenvalue  is  so  small  that  a  true  cutoff  point  is  difficult  to 
determine.  This  problem  is  shown  graphically  in  Figure  3. 

Preliminary  efforts  have  shown  that  the  use  of  recursive  techniques  to 
solve  the  matrix  equation  once  the  proper  rank  has  been  determined  reduces 
the.  large  errors  described  above  and  shown  in  the  example  of  Table  1.  The 
determination  of  the  proper  rank  is  still  a  problem  when  faced  with  the 
difficulty  illuminated  in  Figure  3. 


The  HFTI  Method 


Another  approach  to  the  determination  of  the  rank  of  the  system  of 
equation  (3),  where  the  A  matrix  is  y  by  N,  is  given  by  the  HFTI  algorithm 
[5].  The  HFTI  algorithm,  described  in  detail  in  [5],  is  designed  to  speci¬ 
fically  to  give  a  least  squares  solution  to  a  rank-defic.ient  problem  by 
using  Householder  transformations.  The  method  transforms  the  matrix-vector 
combination  [ A : b ]  of  (4)  to  a  matrix-vector  combination  [R:c]  using  premulti¬ 
plying  Householder  transformations  with  column  interchanges.  All  subdiagonal 
elements  in  the  matrix  R  are  zero  and  its  diagonal  elements  satisfy  |rij_| 

lri+l»ri+ll>  i  =  lj  .  X.-1,  where  Z  =  min[y,N].  The  proper  rank  of 

the  system  can  be  found  by  comparing  the  diagonal  values,  |rijj,  and  looking 
for  a  cutoff  as  was  done  for  the  eigenvalues  of  the  previous  section.  The 


151 


subroutine  HFTI  will  accept  an  input  parameter  x  which  it  compares  with  the 
diagonal  elements  to  determine  a  new  pseudorank  k  for  the  system.  The 
pseudorank  k  is  defined  as: 


k  =  i,  when  |r  |  >  T  and 

lri+l,i+J  <  T  * 


The  method  then  calculates  a  minimal  length  solution  vector  x  for  the  problem 
defined  by  the  first  k  rows  of  [R:c], 

Figure  4  shows  an  example  of  the  resulting  diagonal  elements  |r£i|.  HFTI 
was  used  on  a  fourth  order  system  presented  by  a  rank  deficient  matrix  of 
dimensions  20  by  6.  The  figure  shows  the  results  for  both  clean  and  noisy 
response  data.  The  problem  here  is  the  same  as  in  the  eigenvalue  method. 

That  is,  it  is  difficult  to  know  what  value  of  T  should  be  specified  so  that 
the  proper  rank  of  the  system  can  be  determined. 


Summary  and  Conclusions 

The  two  previous  sections  briefly  discussed  two  methods  for  determining 
the  order  or  rank  of  a  system  so  that  the  proper  poles  can  be  extracted. 
While  the  methods  are  basically  straight  forward  the  actual  application  of 
the  method  is  limited  if  noise  is  present  in  the  data.  The  problem  which 
now  must  be  addressed  is  how  does  one  determine  the  optimum  cutoff  point  for 
the  eigenvalue  method  or  the  T  parameter  for  the  HFTI  method.  It  appears  ' 
that  numerical  parameter  studies  need  to  be  run  to  get  a  better  feel  for 
this  procedure.  Preliminary  results  bode  optimistically  on  obtaining  an 
optimal  cutoff  procedure  so  that  the  entire  resonance  extraction  procedure 
can  be  automated. 


References 


1.  Michael  L.  Van  Blaricum,  "A  Review  of  Prony's  Method  Techniques  for 
Parameter  Estimation,"  1978  RADC  Spectral  Estimation  Workshop. 

2.  J.D.  Lawrence,  "Polynominal  Root  Finder,"  Lawrence  Livermore  Laboratory, 
Livermore,  CA,  CIC  Note  C212-010,  January  1966. 

3.  D.H.  Schaubert,  "Application  of  Prony's  Method  to  Time  Domain  Reflecto- 
meter  Data  and  Equivalent  Circuit  Synthesis,"  HDL-TR-1857 ,  Harry  Diamond 
Labs,  June  1978. 

4.  M.L.  Van  Blaricum  and  R.  Mittra,  "Problems  and  Solutions  Associated  with 
Prony's  Method  for  Processing  Transient  Data,"  IEEE  Trans,  on  Ant.  and 
Prop.,  January  1978,  pp.  174-182. 


152 


5.  C.L. 
Hall, 


Lawson  and  R.J.  Hanson,  Solving  Least  Squares  Problems,  Prentice 


TABLE  1.  Results  for  Example  2: 

R(t)  -  e"2t  SlN(TTt),  y  =  50,  At  =  0.1  S 


153 


EIGENVALUES 


X  -  Clean  Data 


x  -  No  Noise 
O “  Noisy  Data 


10 

103  -[-  ©  ©  ©  © 
10‘ 

JO1  4- 


10 

io'1  + 


10 


-2 


10 


10 


-4 


1  2 


o  o 


4—1 


FIGURE  3.  Example  Showing  the 

Lack  of  a  Clean  Break¬ 
point  or  Knee  for  a 
Fourth  Order  Noisy  Data 
Response  as  Compared  to 
the  Clean  Data  Response, 


FIGURE  4.  Results  of  \ri±\  from 
HFTI  using  a  Four  Pole 
System  with  and  without 
Noise.  The  Matrix  was 
Rank  Deficient  at  Di- 
,  mens ions  20  by  6. 


155  ’ 


A  UNIFYING  MODEL  FOR  SPECTRAL  ESTIMATION 

CHARLES  BYRNE 

The  Catholic  University  of  America 
Washington,  D.C.  20064 

and 

RAYMOND  FITZGERALD 

Naval  Research  Laboratory 
Washington,  D.C.  20375 


Abstract 


The  Fourier  transform  of  a  band-limited  time  function  can¬ 
not  be  determined  from  finitely  many  observations.  For  this 
reason  spectral  estimation  necessarily  involves  the  substitution, 
for  the  original  Fourier  transform,  of  a  function  that  is  so  de¬ 
termined.  Some  approaches  make  explicit  the  nature  of  the  sub¬ 
stitution,  by  assuming  that  the  transform  has  a  fairly  simple 
form  (a  rational  function,  for  example),  whose  papameters  can  be 
computed  from  the  data.  With  other  methods,  such  as  those  that 
rely  on  time-domain  extrapolation  or  on  iterative  approximations, 
it  is  not  always  clear  what  is  the  substitution  that  the  method 
is  introducing. 

In  this  article  we  discuss  an  explicit  model,  or  substitute, 
for  the  Fourier  transform,  based  on  over-sampled  data.  Our  opti¬ 
mal  windowing  or  modified  DFT  model  is  seen  to  coincide  with  the 
minimum  energy  estimate  of  de  Figueiredo,  Upon  examining  the 
methods  of  Cadzow  and  of  Kolba  and  Parks,  we  find  this  same  model 
implicit  in  both  techniques. 


Introduction 


According  to  the  sampling  theorem,  a  band-limited  signal  of 
finite  energy  can  be  reconstructed  from  uniformly  spaced  samples, 
provided  the  interval  between  sample  times  is  small  enough.  In¬ 
deed,  if  x(t)  is  a  complex  function  of  a  real  variable  t,  of 
finite  energy,  and  its  Fourier  transform, 

XO)  =  j!»x(t)e1Wtdt,  (1) 


157 


is  zero  for  w  not  in  the  band  W  =  [-o,  o),  then  we  have  that 


X(w)  =  A  X  x(nA)einAw,  |w|  <  tt/A,  (2) 

-  CD 

for  any  sampling  rate  0  <  A  <  tt/o.  Taking  the  inverse  Fourier 
transform,  we  get,  for  all  t, 


CO 

x(t)  =_£  x(nA)  [  s in(nAa  -  ta) ]/(n'n-t'rr/A)  (3) 


In  practice,  because  our  observations  are  necessarily  limit¬ 
ed  to  some  finite  interval  of  time,  say  [-d,  d],  only  finitely 
many  of  the  Fourier  coefficients,  x(nA),  will  be  known.  As  has 
been  noted  by  Levi  [4],  an  arbitrarily  band-limited  signal  can  be 
made  to  pass  through  any  finite  set  of  points.  Consequently,  any 
procedure  that  estimates  X(w)  from  finitely  many  observations 
introduces  a  subsitute,  or  model,  for  X(w)  that  is  determined  by 
the  data,  or  what  is  the  same  thing,  makes  additional  assumptions 
about  X(w)  which  make  it  possible  to  reconstruct  X(w)  from  our 
limited  knowledge.  These  models  for  the  Fourier  transform  to  be 
estimated  are,  at  times,  explicitly  described,  as  in  [2],  where 
de  Figueiredo  uses  the  criterion  of  minimum  energy  to  derive  a 
model  using  splines.  Other  explicit  models . appear  in  the  host  of 
articles  on  rational  approximation  and  ARMA  schemes.  With  other 
approaches,  such  as  those  that  rely  on  time-domain  extrapolation 
or  iterative  approximation,  the  exact  nature  of  the  model  being 
introduced  is  not  always  clear.  Nevertheless,  the  operative  mo¬ 
del  is  the  essence  of  the  procedure  and  must  be  clearly  under¬ 
stood  before  that  procedure  can  be  adequately  compared  with  other 
methods . 

In  this  article  we  propose  an  explicit  model  for  the  estima¬ 
tion  of  the  Fourier  transform  from  over-sampled  data.  We  derive 
it  as  a  best  approximation  to  the  unknown  Fourier  transform,  and 
then  note  that  it  can  also  be  derived  as  a  minimum  energy  esti¬ 
mate,  as  was  done  in  [2].  One  can  also  view  this  model  as  a 
windowing  procedure,  that  uses  coefficients  that  are  optimal  in 
the  sense  described  below,  and  are  dependent  upon  the  data.  We 
then  turn  Lo  a  study  of  implicit  models,  dealing  specifically 
with  the  recent  work  of  Cadzow  [1]  and  of  Kolba  and  Parks[3]. 

As  we  discover,  the  extrapolation  suggested  by  both  of  these  me¬ 
thods  can  be  by-passed,  and  if  this  i=  done,  the  operative  model 
becomes  the  same  optimal  windowing  prosented  earlier.  It  is  in 
this  sense  that  our  model  is  described  as  "unifying". 


158 


An  Optimal  Windowing  Model 


Suppose  that  our  data  consists  of  the  values  x(-MA),..., 
x(MA),  where  0  <  A  <  tt/o  .  The  truncated  DFT  model  for  X(w)  is 


M 

X(w)  =  A  E  x(jA)e 

-M 


i.iAw 


,  !w  <  a  . 


This  model  is  relatively  easy  to  compute  as  an  FFT  and  :1s 
optimal,  in  the  sense  that  it  is  the  polynomial  of  the  form 

M  •  -a 

A  Z  a(jA)e1Jiw  (5) 

-M 


that  minimizes  the  mean  square  error 


J 


M 


rr/A  .  .  A  « 

„  .  | X (w) -A  E  a( jA) e1^  w|'dw  . 

-tt/A  -M 


(6) 


However,  this  model  can  be  Improved,  if  we  know  that  the  data  is 

over-sampled,  and  that  X(w)  is  zero  off  the  band  W  = 

where  a  <  tt/a  .  It  is  reasonable  then  to  minimize  not  (6),  but 

.  a  M  •  -a  a 

a  |X(w)  -  A  E  a(jA)eljAw|  dw  .  (7) 

-M 


According  to  the  orthogonality  principle  (see,  for  example, 
[5],  P.  197),  we  must  then  have 
0  M 

f_0  (X(w) -  A  Z  a(JA)ei'J‘lw)(e"lk6w)  dw  =  0  (8) 

for  k=  -M, .  ,  . ,  M  .  It  follows  that  the  a(jA)  must  satisfy  the 
system  of  linear  equations 

M 

x(kA)  =  A  E  a(JA)  sin(  (k-j)  Ao)/(k-j)  Att  ,  (9) 


for  k  =  -M, . . . , M  .  It  is  a  happy  result  that  these  optimal 
coefficients  are  completely  determined  by  the  data.  When  the 
model  for  X(w)  is  (5),  with  coefficients  obtained  from  (9),  we 
shall  say  that  the  model  is  the  modified  DFT  (MDFT) .  This  MDFT 
model  can  be  viewed  as  a  data-dependent  windowing,  with  optimal 
coefficients . 


159 


In  the  recent  article  [2]  de  Figueiredo  presents  a  spline 
approximation  technique  for  spectral  estimation  that  employs  a 
minimum  energy  criterion.  It  is  easily  verified  that  our  MDFT 
model  coincides  with  his,  and  so  has  minimum  energy,  subject  to 
the  data  and  the  band  W.  The  reader  is  also  directed  to  the 
article  [4],  where  the  same  model  is  discussed  in  a  somewhat 
different  context. 

Two  Extrapolation  Methods  and  their  Implicit  Models 

In  [1]  Cadzow  suggests  that  we  by-pass  his  sometimes 
slowly  converging  iterative  procedure  for  estimating  the  Fourier 
transform  and  adopt  a  one-step  extrapolation  method.  Involved 
in  this  method  is  the  assumption  that  there  is  a  function  z(t), 
of  finite  energy,  satisfying  the  integral  equation 

fd 

*(4)  =  id  z(s)sin(  (t~s)  <0/(t-s)  tt  ds  ,  (10) 

whose  Fourier  transform  agrees  with  X(w)  on  the  band  W.  Knowing 
only  x(-MA) ,  . . . , x(MA) ,  we  approximate  the  integral  equation 
by  a  system  of  linear  equations 

M 

x(kA)  =  A  Z  z(jA)sin(  (k-j)  Ao)/(k-j)  Att  ,  (11) 

-M 

for  k  =  -M, . . . , M  .  Once  we  have  the  values  of  z(jA),  we  can 
extrapolate  x(qA),  |q|  >  M  ,  using  (11),  with  qA  replacing  kA  . 
The  Fourier  transform  is  estimated  from  a  sufficiently  expanded 
set  of  samples  and  extrapolations.  However,  we  may  proceed 
somewhat  differently.  Having  computed  the  solution  of  (11),  we 
are  able  to  write  the  X(w)  in  closed  form;  using  (2)  and  (11) 
and  rearranging  terms  we  obtain  .with  h(t)  =  (sin(at) ) /rrt  , 

X(w)  =  A  E  z ( j A)  [A  Z  h(nA-jA)e^n"^Aw]eljAw  ,  (12) 

-M  -00 

where  the  term  in  square  brackets  is  the  Fourier  expansion  of 
the  function  that  is  1  on  W  and  0  off  of  W.  So  the  operative 
model  is  the  MDFT.  Of  course,  if  the  extrapolation  is  used, 
the  MDFT  model  is  only  approximated,  due  to  the  truncation  that 
necessarily  results. 

Let  us  turn  to  the  extrapolation  method  of  Kolba  and  Parks 
[3].  Consider  equation  (3),  with  t  =  kA  .We  then  have  ,  for  each 


(13) 


fixed  value  of  k. 


x(kA)  =Z  x(nA)  sin(  (kA-nA)  o)/(k-n)  rr 

„  00 

Viewing  (13)  as  the  inner  product  of  two  sequence,  (x(nA) } 
representing  X(w),  and  (sin(  (kA-nA)  a)/(k-n)  rr],  we  see  that  the 
second  sequence  represents  the  linear  functional  that  extracts 
the  kA-th  coefficient.  Knowir^  only  x(kA),  k  =  -M,  M  sug¬ 

gests  that  we  approximate  fm,  |m|  >  M,  by  a  linear  combination  of 
the  f^,  | k |  <  M,  labeled  f’ra.  So 


f 


m 


M 

j=-M  bmj  f  j 


(IK) 


We  seek  optimal  coefficients,  so  applying  the  orthogonality  prin¬ 
ciple,  we  get 

M 


sin(  (mA-kA)a)/(mA-kA)  tt  =  bmj.  sin(  ( jA~kA)a)/(  jA-kA)  n  (13) 


for  each  k  =  -M,  ...,  M.  If  we  let  bm  be  the  resulting  solution 
vector,  the  best  extrapolation  of  our  data  is  then  given  by 


x(mA)  =  bm  •  y,  jmj  >  M  (l6) 

where  y  is  the  vector  containing  the  data,  x(kA),  |kl  <  M.  This 
is  the  extrapolation  method  due  to  Kolba  and  Parks  [3].  It  is  an 
easy  exercise  to  show  that  it  is  the  same  extrapolation  obtain¬ 
able  from  the  MDFT  model.  The  extrapolation  can,  as  before,  be 
by-passed,  and  X(w)  be  obtained  directly.  The  model  is  once 
again  the  MDFT . 


Conclusion 

The  MDFT  model  presented  here  is  an  optimal  data-dependent 
windowing,  as  well  as  the  minimum  energy  estimate.  It  is  also 
the  implicit  model  being  used  in  the  one-step  method  of  Cadzow, 
and  in  the  best  linear  extrapolator  procedure  of  Kolba  and  Parks. 
By  uncovering  the  implicit  model  we  are  able  to  provide  further 
justification  for  the  use  of  each  of  these  methods,  as  well  as  to 
make  unnecessary  the  comparisons  of  these  methods  that  focus 
needlessly  on  superficial  differences  in  development  ,  while 


S5H5BSSS 


ignoring  the  common  model. 

It  is  also  possible  to  derive  the  MDFT  model  using  a  maxi¬ 
mum  entropy  criterion,  or  as  a  moving  average.  The  details  will 
be  presented  elsewhere. 


References 


1 .  Cadzow,  J ,A . ,  "An  Extrapolation  Procedure  for  Band- limited 
Signals . "  IEEE  Transactions  ASSP-27^  4-12  (1979). 

2.  de  Figueiredo,  R.  "Optimal  Est imat ion  q£  Es s.ent Tally  and 
Strictly  Band- limited  Signals  and  their  Spectrum  hx  General¬ 
ized  Splines . "  TEEE  TCASSP.  April  1979,  p.  194-199. 

3.  Ko'iba,  D.P.,  and  Parks,  T.W.,  "Extrapolation  and  Spectral 
Estimation  for  Band- 1 imited  Signals.  1  TF.F.F,  TCASSP.  April  1978 

p.  372-374'. 

4.  Levi,  L.,  "Fitting  a  Band -limited  Signal  to_  Given  Points,  " 
IEEE  Transactions  on  Information  Theory,  Volume  IT-11,  July 
1965,  p.  372-376. 

5.  Papoulis,  A.,  Signal  Analysis,  McGraw-Hill,  N.Y.,  1977. 


A  COMPARISON  OF  THE  BURG  AND  THE  KNOWN-AUTOCORRELATION 
AUTOREGRESSIVE  SPECTRAL  ANALYSIS  OF  COMPLEX  SINUSOIDAL 
SIGNALS  IN  ADDITIVE  WHITE  NOISE 


Robert  W.  Herring 

Communications  Research  Centre 
Shirley  Bay 

P.O.  Box  11490,  Station  "H" 
Ottawa,  Ontario  K2H  8S2 
CANADA 


Abstract 

Burg's  algorithm  for  Maximum  Entropy  autoregressive  spectral  estimation  is  analyzed  for  the  cases 
of  one  and  two  complex  sinusoidal  signals  in  additive  white  noise.  For  the  latter  case  are  found  two 
biases  which  can  account  for  the  line  splitting  and  line  shifting  that  occur  in  simulation  studies  when 
the  SNR  is  very  high.  These  biases  vanish  completely  if  the  two  complex  sinusoids  are  in  phase  quadra¬ 
ture  at  the  middle  of  the  data  record;  if  there  is  an  integral  number  of  half-cycles  of  difference  fre¬ 
quency  contained  in  the  data  record,  then  the  spectral  estimate  will  be  biased  although  the  effects  be¬ 
lieved  to  cause  line-splitting  will  be  eliminated.  Results  of  simulation  studies  to  support  these 
conjectures  are  presented. 

1 .  Introduction 

The  Burg  algorithm  for  the  autoregressive  spectral  analysis  of  time-series  data  [1,2,3],  sometimes 
referred  to  as  the  maximum-entropy  method  (MEM),  is  known  to  be  inappropriate  for  the  case  of  sinusoidal 
signals  in  additive  white  noise.  This  inappropriateness  had  been  demonstrated  both  theoretically  [4-G] 
and  in  practice  [7-9].  A  theoretical ly  correct  model  [4-6]  for  the  generating  process  for  an  N-pole 
complex  sinusoidal  signal  in  additive  white  noise  is  an  N-pole,  N-zero  network  with  identical  gain 
weights  in  its  feedback  (pole)  and  feedforward  (zero)  parts,  being  excited  with  a  white-noise  input  (see 
Fig.  1).  Autoregressive  analysis  models  the  generating  process  for  the  data  as  an  all-pole  network 
excited  by  white  noise.  Since  a  zero  in  the  generating  process  network  can  be  simulated  exactly  only 
by  an  infinite  number  of  poles,  it  is  clear  that  when  autoregressive  analysis  is  used,  in  principle  an 
infinite  set  of  either  autocorrelation  or  time  series  data  must  be  used  in  order  to  achieve  correct  re¬ 
sults  . 

■  Fougere  [9]  has  stated  that,  in  the  high  signal-to-noise  ratio  (SNR)  case,  the  Burg  algorithm  is 
overconstrained.  In  the  case  of  simulation  studies,  this  overconstraint  causes  errors  in  the  estimated 
frequencies  of  spectral  lines  and  the  false  splitting  of  spectral  lines  known  to  have  been  generated  by 
a  single  pole  (or  a  pair  of  poles,  in  the  case  of  real  signals).  Fougere  has  developed  an  algorithm 
which  avoids  this  phenomenon,  but  the  algorithm  is  based  on  a  gradient-search  technique  which  lacks  the 
intrinsic  efficiency  of  the  unmodified  Burg  algorithm. 

In  spite  of  its  known  limitations,  the  Burg  algoritnm  is  often  used  because  of  its  computational 
efficiency.  This  report  explores  analytically  the  expected  response  of  the  Burg  algorithm  to  time- 
series  data  comprising  one  or  two  complex  sinusoids,  with  and  without  the  presence  of  additive  white 
noise.  It  is  shown  that  only  in  very  special  cases  does  the  Burg  algorithm  lead  to  the  same  results  as 
are  achieved  when  the  true  autocorrelation  functions  of  the  signals  are  known. 

2.  Review  of  Autoregressive  Spectral  Analysis 

Autoregressive  spectral  analysis  is  based  on  the  idea  that,  if  it  is  somehow  possible  to  desiqn  a 
feedforward  (all-zero)  filter  which  has  as  its  input  the  data  to  be  analyzed,  and  has  as  its  output 
random  white  noise,  then  the  power  spectrum  of  the  input  data  is  given  by  the  reciprocal  of  the  power 
transfer  function  of  the  ‘'ilier.  Since  this  filter  accounts  for  all  the  predictability  inherent  in  the 
input  siqnal  and  has  as  it',  output  only  unpredictable  random  white  noise,  it  is  often  referred  to  as  a 
prediction-error  filter  (PEF). 

There  are  several  well-known  techniques  for  estimating  the  PEF  corresponding  to  a  given  set  of  data. 
When  only  amplitude  time-series  data  are  available,  most  of  these  depend  upon  estimates  of  the  auto¬ 
correlation  function  derived  from  the  time-series  data.  The  Burg  algorithm,  however,  attempts  to 
avoid  possible  biases  or  inconsistencies  in  such  estimates  of  the  autocorrelation  function  by  deriving 
an  estimate  of  the  PEF  coefficients  directly  from  the  data. 

In  this  report  the  algorithm  for  generating  the  PEF  when  the  true  autocorrelation  function  is  known 
is  reviewed  in  Section  2.1  and  the  Burg  algorithm  is  reviewed  in  Section  2.2.  The  results  from  these 
sections  are  then  used  to  generate  the  sets  of  PEFs  corresponding  to  the  cases  of  one  and  two  complex 
sinusoids  in  additive  white  noise,  and  the  properties  of  these  sets  of  PEFs  are  compared  and  contrasted. 


2,1  The  Known-Autocorrelation  (KA)  Algorithm  [10,11] 


Let  it  be  assumed  that  N  equispaced  samples  R(n)  of  the  complex  autocorrelation  function  have  been 
given,  for  n=0,l ,2, . . . ,N-1 ,  where  N  may  be  finite  or  infinite.  It  is  assumed  that  the  Nyquist  sampling 
criterion  has  been  met.  Then  the  system  of  equations  to  be  solved  is 


M 


y  R(k-m).<(m,M) 
m=0 


P(M)A(k) 


M 

N-l 


o: 


where  the  u(m,M)s  for  m=0 , 1 ,2, . . . ,M  are  sets  of  PEF  coefficients,  each  value  for  M  denoting  a  different 
set;  the  P(M)s  are  called  the  output  error  powers  and  are  real;  and  .s ( k )  =  1  if  k=0,  6 { k )  =0  otherwise,  is 
the  Kroenecker  delta  function.  In  order  to  maintain  proper  scaling,  the  leading  terms  of  the  PEFs, 
.,(0,M),  M=0, 1 , . . . ,N-1 ,  are  set  equal  to  -1  by  definition. 


Since  negative  indices  for  R(k-m)  occur  in  the  set  of  equations  (1),  it  is  necessary  to  note  that 
R(-n)=R*(n),  where  the  asterisk  (*)  denotes  complex  conjugation.  This  last  fact  allows  (1)  to  be  re¬ 
written  in  the  alternative  form; 


M 


)'  R(k  '  +m)«x* ( in ,M ) 
m=0 


P(M) A( k 1 ) 


-M  < 

0  < 


<  0 
N-l 


(2) 


Equations  (2)  imply  that  the  same  result  is  obtained  if  the  complex  conjugate  of  a  PEF  is  applied  to  the 
time-reversed  autocorrelation  data.  This  "reverse-conjugate"  symmetry  is  used  in  the  derivation  of  the 
Burg  algorithm. 


If  the  set  of  linear  equations  (1)  for  a  particular  value  of  M  is  written  in  matrix  form,  it  can  be 
seen  that  the  matrix  of  autocorrelation  samples  [R( k ,m) ] ,  where  R( k ,m) =R( k-m) ,  is  an  MxM  Toeplitz  matrix. 
Therefore  (1)  can  be  solved  by  applying  the  algorithm  developed  by  Levinson,  Robinson  and  Durbin,  other¬ 
wise  known  as  the  Levinson  recursion.  Following  [11]  the  recursive  solution  to  (1)  can  be  written  as; 


R(0) 


P(0)  = 

M-l 

-  )'  u(m,M-l  )R(M-m)/P(M-l ) 

m=0 


(3a) 


(3b) 


u(m,M) 

P(M) 


n(m,M-1 )  -  <«(M,M)  ct*(M-m,M-l) 
(1  -  l'x(M.M)  | 7 )  P(M-i) 


m  =  1,2 . M-l 

M  =.  1,2,... 


(3c) 

(3d) 


Note  that  at  each  successive  stage  of  the  recursion,  the  introduction  of  one  new  autocorrelation  sample, 
R(M) ,  generates  but  one  independent  value  n(M,M)  for  the  Mth- order  PEF;  all  other  coefficients  of  the 
Mth-order  PEF  are  determined  from  linear  combinations  of  the  coefficients  of  the  (M-l ) th_ order  PEF  and 
their  complex  conjugates,  using  i(M,M)  as  indicated  by  (3c). 


The  <(M,M)s  are  sometimes  referred  to  as  the  reflection  coefficients,  because  of  the  analogy  of 
their  appearance  in  (3d)  with  a  similar  equation  which  occurs  in  the  theory  of  a  signal  propagating 
through  a  layered  medium  and  being  partially  reflected  at  each  layer  interface. 


In  executing  the  recursion  of  equations  (3a-d)  it  can  occur  (at  least  in  theory)  that,  for  some 
particular  value  of  M,  say  M0,  P(Mo)=0.  This  implies  that  |u(M0,Mo) I  =  1  -  This  condition  can  arise  only 
in  the  case  where  the  signal  being  analyzed  can  be  modelled  as  M0  complex  sinusoids  with  no  additive 
noise  (see  Sections  3.1  and  3.2);  in  general  M0  is  not  finite..  In  particular,  M0  cannot  be  finite  when 
additive  white  noise  is  present  [4], 


For  each  order  M  of  PEF,  an  estimate  of  the  power  spectrum  based  on  (M+l)  values  of  the  autocor 
relation  function  is  given  by 

P(M)  . . 


*KA< • >M) 


l'M 

I  )  ‘i(m,M)exp( - jiiv,') 
111=0 


(4) 


where  „  is  the  normalized  angular  frequency  in  radians  with  -'’•i..mi,  and  the  subscript  KA  refers  to  the 
known-autocorrelation  case.  As  discussed  in  Section  2.0  and  as  examination  of  (1)  will  indicate,  the 
PEFs  are  "spiking"  or  whitening  filters,  since  all  but  one  of  their  output  values  are  zero.  The  power 
spectrum  of  such  an  output  signal  is  independent  of  frequency,  or  "white".  The  denominator  in  the 
right-hand  term  of  (4)  is  the  power  transfer  function  of  the  PEF,  which  if  multiplied  by  Xka('">M)  ,  the 
estimated  spectrum  of  the  signal,  yields  the  constant,  white-noise  spectrum  P(M).  Thus,  since  both  P(M) 
and  the  power  spectrum  of  the  PEF  can  be  calculated,  the  signal  power  spectrum  X|</\(>..,M)  can  be  estimated 
from  (4)  for  successively  higher  orders  of  PEF. 


2.2  The  Burg  Al gori thm 

The  Burg  algorithm  is  a  procedure  for  estimating  the  reflection  coefficients  directly  from  a  set  of 
time-series  amplitude  data.  It  avoids  the  biases  introduced  into  the  spectral  estimate  when  the  auto¬ 
correlation  is  estimated  from  the  data  and  the  known-autocorrelation  algorithm  is  then  applied;  however, 
as  is  shown  in  Sections  4.1  and  4.2,  the  Burg  algorithm  introduces  biases  of  its  own  sort. 

Let  it  be  assumed  that  a  set  of  N  time-series  amplitude  data  x(n),  n=0,l,2,.. . ,N-1  have  been  given, 
and  that  -x(n)  =0,  where  the  brackets  •  ’denote  expected  value.  The  PEFs  are  derived  sequentially.  Each 
successively  higher  order  PEE  is  applied  to  the  data  in  both  directions  simultaneously,  and  the  average 
of  its  forward  and  oackward  output  error  powers  is  minimized  by  adjusting  only  its  reflection  coeffi¬ 
cient.  The  remaining  coefficients  of  each  PEF  depend  on  the  sequence  of  reflection  coefficients  through 
the  functional  relationship  defined  by  (3c).  The  motivation  for  this  procedure  is  its  analogy  with  that 
defined  by  (1),  (2),  and  (3a-d). 

Following  e.g.,  [3,11-14]  and  taking  proper  note  of  the  occurrences  of  complex  conjugation  in  the 
complex  data  case,  the  Burg  algorithm  can  be  written  in  a  lattice-filter  formulation: 


fM(n) 

=  fM-l(n) 

-  l'(M,M)bH1(n-l) 

(5a) 

bM(n) 

-  W"'1 

)  -  B*(M,M)fH1(n) 

n 

M 

=  M,M+1 .... ,N-1 
=  1,2 . N-l 

(5b) 

fo(n) 

=  x(n)  = 

bQ(n) 

n 

=  0,1,2 . N-l 

(5c) 

(See  Fig.  2,  where  z~'  denotes  the  unit  time-delay  operator.)  The  series  f^(n)  is  the  output  frcm  the 
Mth. order  PEF  applied  to  the  innut  signal  x(n)  in  the  forward  direction,  and  is  expressed  in  terms  of 
the  output  series  from  the  (M- 1 ) th_staqe  of  the  lattice.  The  series  bM(n)  is  the  output  from  the  Mth. 
order  PEF,  conjugated  and  applied  to  the  input  data  in  the  reverse  or  backward  direction,  and  again  is 
expressed  in  terms  of  the  output  series  from  the  (M-l)th.stage  of  the  lattice.  The  B(M,M)s  are  the  re 
flection  coefficients. 


by 


The  sum  of  the  forward  and  backward  output  error  energies  at  each  stage  of  the  lattice  is  given 


N-l 

E(M)  =  )'  ( |fM(n)  |  +  I bM( n )  |  •’ )  M=  0,1,2 . N-l  (6) 

n--M  M  N 

The  formula  for  computing  i-(M,M),  the  Burg  estimate  of  the  Mth-order  reflection  coefficient,  is  derived 
by  substituting  eqns.  (5a)  and  ( Gb )  into  (6),  setting  [ jE(M)/',i|i*(M,M)]=0  and  solving  for  fi(M,M). 


r(M,M) 


N-Y 

y 

n=M 


2Xb*M-l(n-1)fM-l(n) 


( [bM_i(n-l )  |  *  ♦  |fM_i(n)n 


(7) 


Notice  the  similarity  of  (7)  to  a  single-lag  unwindowed  cross-correlation  of  the  forward  and  backward 
output  series. 


In  order  to  obtain  spectral  estimates,  it  is  usual  to  let  the  output  error  powers  n  be  defined  as 


"(0)  =  E(0)/(2N) 

(8a) 

and 

'•(M)  »  (l  -  |i-(M,M)  |  ■’  )n(M-l ) 

M  =  1,2 . N-l 

(8b) 

by  analogy  with  (3a)  and  (3d)  respectively.  Then,  letting  the  p(M,M)s 

be  defined  by 

,-.(m,M)  =  r(m.M-l)  -  ,i(M,M)  r*(M-m,M-l) 

m  =  1,2,...  ,M-1 

(9) 

by  analogy  with  (3c),  and  P(Q,M)^-1  oy  definition,  the  Mth.0rder  Burg  power  spectrum  estimate  Xb(w,M) 
is  given  by 

y-'N)  =  !-M — -M — no 

j  )  (- (m»M)exp( -jin,..) 

|m=0  I 

by  analogy  with  (4) . 


165 


qpprwigg mum 


J'P.WMW  w  Vfppw 


2.3  The  Zeroes  of  the  Prediction-Error  Filters  (PEFs) 

By  employing  standard  z-transform  techniques,  the  z-transforms  of  the  PEFs  for  the  KA  case  and  the 
Burg  case  can  be  written  as  polynomials  in  the  complex  variable  z.  For  the  KA  case,  this  polynomial  is 
Fka(z.M)  where 

M 

Fka(z,M)  =1-1  u(m,M)z‘m  (11) 

m=l 

Then  (4)  can  be  rewritten  as 

Xka(u.,M)  =  P(M)/|FKA[exp(ja)),M]|2  (12) 

where  the  denominator  is  the  squared  magnitude  of  F ka( 2 ,M )  evaluated  around  the  unit  circle  (|z|  -  1). 
Similarly,  for  the  Burg  case,  the  polynomial  is  Fg(z,M),  where 

M 

F  (z,M)  =  1-1  B(m,M)z'ln  (13) 

u  m=  1 

and  (10)  can  be  rewritten  as 

Xb(u.M)  =  ’i { M) /  | Fg [exp(  juj )  ,M]  | ?  (14) 

It  is  apparent  from  (12)  and  (14)  that  if  any  zeroes  of  F KA ( z , f4 )  or  Fg(z,M)  lie  near  or  on  the  unit 
circle,  then  the  magnitude  of  the  spectral  estimators  X|</\(w,M)  or  Xb(w,M)  will  be  large  at  locations  on 
the  unit  circle  in  the  vicinity  of  such  zeroes.  Conversely,  zeroes  lying  close  to  the  origin  of  the 
complex  z-plane  will  have  little  influence  on  the  peaks  of  the  spectrum,  but  will  affect  its  magnitude 
away  from  the  peaks.  Thus  some  insight  into  the  character  of  an  autoregressive  spectral  estimate  can  be 
gained  by  studying  the  locations  of  the  zeroes  of  its  associated  PEF. 


3.  The  One-Pole  Complex  Sinusoid  Case 

The  formula  for  a  complex  signal  (xj(n))  consisting  of  a  single  complex  sinusoid  in  the  presence  of 
additive  white  noise  is  given  by 

x,(n)  =  A]  exp(jru1 )  +  t(n)  (15) 

where  n  is  any  positive  or  negative  integer,  or  zero;  A-|  is  the  complex  amplitude  of  the  complex  sinu¬ 
soid;  vi  is  the  angular  frequency  of  the  complex  sinusoid,  normalized  so  that  -!><uq<n;  and  (n)  is  addi¬ 
tive  white  noise  having  the  property 

*(n)i  ( n+k ) ">  =  |  ,•  | 2  6  ( k  /  (16) 

where  |t.|?  is  used  to  denote  the  variance  of  c(n)  and  6(k)  is  again  the  Kroenecker  delta  function. 


The  KA  Estimate  of  the  PEF 


The  autocorrelation  function  R;(k)  of  the  signal  defined  by  (15)  is  given  by 

R-j  ( k )  =  |A1 |2  exp( jkwj )  +  H  ^(k) 

where  k  is  any  positive  or  negative  integer,  or  zero,  and  in  general  R(k)  is  defined  by 

R ( k )  =  ^x*(n)x(n+k)  • 

Substitution  of  (17)  into  (3a)  gives  the  result 

P1  (0)  =  |i,|?(o{  +  1) 

where  o?=  I  Ai  | 2/ 1 1 1 3  is  the  signal-to-noise  ratio  (SNR).  If  there  is  no  noise,  i.e.  |(|'=0,  then 
Pi(o)=|A1 |? . 

The  following  general  result  can  be  derived  from  (3b),  (3c)  and  (3d)  when  |  >- 1 ? /O  and  |  A  -|  I  ■  V  0 : 

[o2/(Mo2+l  )]exp(  jm.^  ) 


•i1  (m,M) 


[l/(M+,;.|,i)]exp( jm. ,) 


and 


P,(M) 


j [(M=l )n2+l ]/[Mo2  +  I  ] 


1,2,3,...," 
1  ,2,3,  ...M 


(17) 

(18) 
(19) 

(20a) 

(20b) 

(21) 


In  the  noise-free  case  (h  I'-O  and  t,  —  ),  (20a)  or  (20b)  imply  that  «i  ( 1 ,1  )=exp(  ji.q )  so  that 
'il  ( 1 , 1 )  1 2  =  1  .  Then  (3d)  implies  that  P  ]  ( 1 )  =0 .  Thus,  for  this  special  case,  the  sequences  defined  by 
20)  and  (21)  terminate  at  M=l.  Otherwise  for  this  signal  model  the  sequences  are  infinite,  with 
Pl(M)  >|e|‘(l+M'l)  as  Mo] 


(22) 


The  z-transform  of  the  KA  PEF  (cf.  ill))  is 

F^U.M)  -  1  -  [ 


where  the  superscript  1  denotes  a  one-pol 
at  z0=[oi/(o'+l)]exp(ju),) .  For  M=2,  Fj^' 


1  -  [o^/(Ma2+l)]  l  z  exp(  jmui, ) 

m=l  / 1  , 

e  complex  sinusoidal  signal.  For  M=1,  Fj^'(z,l)  has  a  zero 
(z,2)  has  zeroes  at 


z  =  ,  / - exp  ( ju>, }  (23) 

0  o.j  !  i/9o|  +  4 

so  that  for  hign  SNR  (1  ■ o']  <-  >») ,  the  zeroes  occur  at  approximately  1.0  exp(jw|)  and  -0.5  exp(jw|), 
and  for  low  SNR  (oq  *«.  1),  the  zeroes  occur  at  approximately  faq  exp(juq). 

Some  algebraic  manipulation  shows  that,  for  the  product  Moq  sufficiently  greater  than  1,  an  approx¬ 
imate  root  z0  of  (22)  is  given  by 

2 _ 1 

Zo  ~  I1  -  (M-l )  Mo |  exp  {W  (24> 

2 

This  means  that,  as  Mo-|  the  estimated  location  of  the  pole  corresponding  to  the  single  complex  sinu¬ 
soid  asymptotically  approaches  the  correct  location  exp(jmq)  on  the  complex  z-plane  along  a  radius  or¬ 
iented  at  an  angle  corresponding  to  the  true  frequency  of  the  signal.  This  is  '■rue  even  when  of  <<  1 . 

Numerical  solutions  of  (24)  show  that  the  other  zeroes  tend  to  distribute  iemselv;S  with  approxi¬ 
mately  uniform  angular  separation  and  approximately  constant  radius  inside  the  un,'  r!icle  so  as  to 
account  for  the  uniform  spectrum  of  the  additive  white  noise.  The  radius  at  which  the  zeroes  occur 
varies  inversely  with  the  SNR. 

3.,_2 _ The  Burg  Estimate  of  the  PEF 

Let  is  be  assumed  that  N  samples  of  the  signal  defined  by  (15)  have  been  given: 

x,  ( n)  =  A1  exp(jnw,)  +  e(n)  n  =  0,1 ,2 , . . .  ,N-1  (25) 

Substituting  (25)  into  (6)  by  using  (6c)  and  then  taking  the  expected  value  of  Eq ( 0 )  yields 

<  E  i  ( 0 )  '•  =  2N|r|2(o2  +  1)  (26) 

it  is  not  so  easy  to  calculate  <(J.q(l,l)>,  since  examination  of  (7)  shows  that  it  is  recessary  to 
derive  the  expected  value  of  a  quotient  of  correlated  random  variables.  In  general,  this  requires  that 
the  statistics  of  the  random  variables  be  specified  and  the  problem  be  solved  numerically.  This  will 
not  be  done  here. 

However,  for  sufficiently  high  SNR  (e.g.,  oq>100  or  20  dB)  the  approximation  (l+x)*l-l-x,  where 
X'i(n)/A],  can  be  used  to  approximate  the  denominator  of  (7).  Then  the  following  approximate  result, 
which  is  independent  of  the  statistical  distribution  of  the  white  noise  r  ( n) ,  is  obtained: 


-14,(1, !)'■  -  fl/[l+B1(l,N)o‘2]|  exptju.,) 


B,(1,N)  =  (N-‘-2)/(N-l)2  (28) 

and  2  Bq ( 1  ,N)  1  for  2\N,m.  Comparison  of  (27)  with  (20b)  for  M=1  shows  that  for  high  SNR  B 1  ( 1 , 1 )  is  a 
biased  estimate  of  «»i(1,l).  However,  B 1  ( 1 , 1 )  correctly  estimates  the  angular  frequency  mq  of  the 
sinqle  complex  sinusoid,  and  the  bias  term  B’(1,N)  inonotonically  approaches  unity  as  N  becomes  large. 

From  (26),  (27)  and  (8a)  and  (8b)  it  can  be  shown  that 

r  n,  (0)  '•  =  |r.|’(0»  +  1)  (29) 

and,  to  the  same  degree  of  approximation  as  was  used  to  obtain  (27) 

<i,,Ob  -  26,  ( 1  ,N)  |  •  | '  (30) 

Comparison  of  (30)  with  (21)  shows  that  liq(l)  is  a  biased  estimator  of  Pq  ( 1 ) ,  since  for  oq  large, 
pf( i )  ■  2 1 •  |2. 

The  result  (30)  implies  that  the  lattice-filter  outputs  f q ( n)  and  b,(n)  as  defined  by  (5a)  and  (5b) 
have  low  SNR,  since  their  expected  power  is  at  most  only  a  factor  of  four  greater  than  the  additive 
white  noise  power  |i  |?.  Therefore  it  would  again  be  necessary  to  specify  the  detailed  statistics  of  the 
additive  noise  before  the  higher  order  reflection  coefficients  could  be  estimated.  This  will  not  be  done 
here . 


167 


3.3  Discussion  of  the  One-Pole  Complex  Sinusoid  Case 

The  results  of  Sections  3.1  and  3.2  show  that,  for  high  SNR  (oplOO)  the  fir.'t  order  ( M=  1 )  PEF  gen¬ 
erated  by  the  Burg  algorithm  is  biased  as  compared  to  the  PEF  generated  by  the  KA  technique.  This  bias, 
however,  monotonical ly  decreases  as  the  number  of  data  N  is  increased,  and  both  the  KA  and  the  Burg  al¬ 
gorithms  correctly  estimate  the  frequency  of  the  complex  sinusoid. 

It  is  impossible  to  investigate  the  properties  of  the  Burg  PEFs  for  the  low  SNR  case,  or  for  orders 
higher  than  M=1  for  the  high  SNR  case,  without  specifying  the  statistical  distribution  of  the  additive 
white  noise.  This  problem  has  not  been  considered  here. 


4.  The  Two-Pole  Complex  Sinusoid  Case 

The  sampled  signal  X2(n)  consisting  of  two  complex  sinusoids  in  the  presence  of  additive  white 
noise  is  given  by 

x2(n)  =  A1  exp(jnu>1)  +  A2  exp( jm,>2 )  +  r(n)  (31) 

where  A^  =  | ! oxp ( )  is  the  complex  amplitude  of  the  kth  sinusoid,  is  its  arbitrary  initial  phase  at 
n=0,  and  is  its  angular  frequency,  normalized  so  that  ,  for  k=l ,  2 .  c(n)  is  additive  white 

noise,  as  in  (15).  It  is  apparent  that  if  A2=Af  and  <u2=-<iq  thin  X2(n)  is  a  sampled  real  sinusoid  in 
additive  white  (complex)  noise. 

Equation  (31)  can  be  written  in  a  form  which  will  be  subsequently  more  tractable: 

x2(i)  =  Aq  exp[j(nuJ0+<f>0)]  <  jr  exp[j(nAi..+A<|> )]+  r"1  exp[-j(nAw+A<|>)]  j  +  c(n)  (32) 

where  A0=  |  An  A?  I  ■  is  the  geometric  mean  of  the  magnitudes  of  the  two  amplitudes;  r=[  |  Ai  |  / 1 A2 1  ]  -■  is  the 
square  root  of  the  ratio  of  the  magnitudes  of  the  amplitudes;  ^0- ( 1  +d>2 ) / 2  is  the  mean 
initial  phase;  ,Vt>-('!'l -|l'2)/2  is  one-half  the  difference  between  the  two  phases;  <«i0-( 011+0.12 ) /2  is  the  mean 
annular  frequency;  and  ,\i.  =  (.,q-„.2)  is  one-half  the  difference  between  the  two  frequencies. 

4. I  The  KA  Estimate  of  the  PEF 

The  autocorrelation  function  R2 ( k)  of  the  signal  defined  by  (32)  is  given  by 

R2(k)  =  A'(r'+r’2)exp(  jk'oo)[coskA(u  +  jp(r)sinkAw]  +  |  r  | '  x(  k )  (33) 

where 

p(r)  -  (r'-r’2)/(r7+r*2)  (34) 

and  k  is  any  positive  or  negative  integer,  or  zero.  Substitution  of  (33)  into  (3a)  gives  the  result 

P2(0)  =  |.  |?(o‘  +  1)  (35) 

where  02=Ao(r?+r*2)/ \t. |  •  is  the  SNR.  If  there  is  no  noise,  then  P2 ( 0 )  =  A0(r2+r"-)  is  the  signal  power. 

From  (33),  (35),  (3b)  and  (3d),  the  following  general  results  for  the  reflection  coefficient  and 
the  output  error  power  for  M=1  can  be  derived  when  |>.  |?  f  0  and  A,,  t  0: 

h2(1,1)  =  exp( ji„o)|[cosAui  +  jr(r)sin, •»..]/[!  -i22]J  (36) 

P2(l)  »  A2(r?+r'2)f[l-n?(r)]sin;'A,.,  +  n22(2+o22)|//{l+;1*2l  (37) 

For  high  SNR  (’2  1),  the  first  reflection  coefficient  is  given  approximately  by 

<,,(1,1)  =  exp(  jW())  [cos.V  +  jp(r)sinAui]  1.IH) 

For  r=l,  i-(r)=  0  and  the  zero  of  F^'(z,l)  (see  Section  2.3)  is  located  at  z0=cos.',  exp(.i  .'0) ,  which  i  ins 
on  a  radius  oriented  at  the  mean  angular  frequency  This  zero  moves  towards  the  limiting  values  ot 

exp(juq)  as  r«’and  n(r)*l,  or  exp(j<..'2)  as  r  ►()  and  r(r)*-l,  and  the  sinqle  complex  sinusoid  case  is 
approached  in  eacn  case. 


The  output  error  power  is,  using  (38),  (35)  and  (8b), 

P?(l)  =  (3Aq  sinA.,.)'/(r'+r"2)  (39) 

which  is  essentially  the  signal  power  attenuated  by  the  factor  2sinAm/( r7+r~2) .  This  factor  is  unity 
when  r=l  and  ,V=  ■  t / 2 ,  and  decreases  as  r  or  Au>  deviate  from  these  values. 


For  low  SNR  (o2-..-l),  the  reflection  coefficient  ar,d  the  output  error  power  are  given  by 


168 


ugd.l)  =  exp(juo)oi  [cosAu  +  jp(r)sinAw]  (40) 

and  P2(l)  -  |,|?  +  Ao(r2+r'2)  (41) 

Thus  the  frequency  estimated  from  the  location  of  the  zero  of  the  M=1  order  PEF  lies  in  the  range  bounded 
by  i. -q' Ala,  and  the  output  error  power  is  essentially  the  unattenuated  signal  plus  noise  power. 

For  M=2,  (33),  (36)  and  (3b)  can  he  combined  to  yield 

,  I  U ->•■’( r ) ) s i n •' Ai„  -  ■{  '  [cos  2 Am  +  ,ip(r)sin2Am] 

‘,'(d2)  =  -exp(j2i.i  )  { . . -  -  - - , - , -  (42) 


and  (37),  (42),  and  (3d)  can  he  combined  to  yield 


f  1  - 1 1  *  ( r )  ] s i n Aoi  +  a 2  ] 


For  high  SNR  (-^  1)  and 


(42)  reduces  to 


2(1-,.  ( r)  ]sin'Au>(  2  +  cos2A,w)  +  on ^ ( 3  ^ ) 

=  i.  |  ... _ _ _  _ 2 _ 2  __ 

[  1  “i ( r )  ]si n'  Am  + 

>> o [  1  — .  (  r)]sirr’Ai..  ••  I cos2Ai,i  +  ji>(r)sin?Auil 


■exp(  j2„ 


It  can  be  shown  from  (4b),  (36)  and  (3c)  that 

<x2 (1,2)  =  exp(jMQ)  2  cosAui  (46) 

(2) 

so  that,  in  the  limit  as  the  left-hand  side  of  (44)  approaches  infinity,  the  zeroes  of  Fj^'  approach 
Zo=exp[j(  .'o !  Ain ) ]  ,  the  true  locations  of  the  poles  of  the  complex  sinusoids.  In  this  same  limit, 

I  >2(2, 2)|' =1 ,  so  that  P? ( 2 )  =0  and  the  recursion  terminates. 

Fo:  the  intermediate  case,  where  ng'-vl  but  (44)  is  not  satisfied,  i.e., 

,’2P-i|:>(»')]sin:’A'.i  -■  |cos2Am  +  j(>(r)sin2Am|  (47) 

(42)  can  be  reduced  to 

■  ^(2,2)  =  exp( j 2->o ) cosAui[0 . 5  cosAw  +  jn(r)sinAm]  (48) 

and  it  can  he  shown  from  (48),  (38)  and  (3c)  that 

•<2(1,2)  =  0.5  exp( jwo) [cosAw  +  jp(r)sinAm]  (49) 

so  that  the  zeroes  of  fK’Iz.O  occur  at  [cosA(.+jn(r)sin.V.]exp(  jup)  and  -0. 5[cosAm+jp( r)sinAio]exp(  juio) . 
Comparison  of  these  results  with  (23),  whirl,  gives  the  zero  locations  of  the  PEF  for  a  one-pole  signal 
is  very  similar, to  that  for  the  one-pole  signal  at  high  SNR.  Comparison  with  (38)  shows  that  one  of 
the  zeroes  of  F^'(z,2)  remains  the  same  as  that  of  F|$'(z,l)  at  high  SNR;  i.e.,  the  two  poles  are 
estimated  as  one  by  the  M=2  PEF  ii  (44)  is  not  satisfied. 

For  the  low  SNR  case  ("•>  ••  1),  (42)  reduces  to 

,<^(2,2)  =  exp( j 2«.»q )  |oij[co$2A(.i  +  ji.(r)sin2Ao>] j  (50) 

and  Pg  ( 2 )  is  the  same  as  8^(1)  as  given  by  (41).  Since  for  this  case  both  (<2(2,2)  |  and  (12(1, 1)1  are 
proportional  to  "2,  then  from  (3c)  it  is  clear  that,  to  first  order  in  ng , 


>.2(I,2) 


*t2(  1.1) 


where  n2(l,l)  is  given  by  (40).  The  zeroes  of  F^^(z,2)  in  this  case  occur  at  approximately 
z=<i2l  •  [cos2iVi+ji>( r) sin2.(,i.,]' +)2[cosAiu+j,  (r)sin  ,i.0/2iexp(.ii.o)  •  For  r=l  and  thus  p(r)=0,  the  zeroes 
occur  at  2=>'2(  ■  lcos2Am]  4  ’2[cosAii']/2’exp(  j,„0) ,  which  lie  close  to  the  orinin  of  the  comnlex  z-plane, 
on  a  diameter  of  the  unit  circle  passing  through  the  point  z=exp(  juip) .  Tor  r*  <'l(n(r)  >-1)  or  r‘>l(p(r) 

•-1)  the  zeroes  tend  to  the  locations  z=’2t 1  I  +,’?/2]exp(  j..q  )  or  z=<’2l  •  l+»2/2]exp(  jw2)  respectively. 

Thus  it  is  apparent  that  for  low  SNR,  the  spectral  estimate  corresponding  to  M=2  is  incapable  of  re¬ 
solving  the  spectral  peaks  corresponding  to  the  poles  of  the  two  complex  sinusoidal  signals. 

There  appears  to  be  no  straightforward  recursion  formula  for  the  KA  PEF  coefficients,  as  in  the 
case  of  the  single  sinusoid  example  of  Section  3.1.  Therefore,  following  this  approach,  it  is  not 
easy  to  determine  the  behavior  of  the  KA  PET s  in  the  case  of  large  M  and,  in  particular,  whether  re¬ 
solution  of  thq  two  sinusoids  is  to  be  expected  for  the  product  Mn2  sufficiently  large,  independent  of 
the  value  of  2-  This  problem,  however,  has  been  solved  using  powerful  matrix  techniques,  by  Marple  [6], 


4.2  The  Burg  Estimate  of  the  PEF 


Let  it  again  be  assumed  (cf.  Section  3.2)  that  N  samples  of  the  signal  defined  by  (32)  have  been 
given: 

x2(n)  =  Aq  exp  [j( nwQ  +  j.Q)]/r  exp[j(nAui+A<|.)]  +  r"1  exp[-j(nA«+A.|>)]}  +  ►  { n)  (52) 

n  =  0,1,2,...  ,N-1 

Substituting  (52)  into  (6)  and  (7),  using  (5c),  and  then  applying  (8a)  and  taking  expected  values  with 
respect  to  the  additive  white  noise  only  yields 

<■  [  1 2 ( 0 ) : ■  =  A^(r,'+r-2)[i  +  2  cosA^^  G(N,Aw)/(r2+r~2)]  +  | <  | ?  (53) 


for  the  expected  signal-plus-noise  power.  Here 


G(N,Auj)  =  i)- 

'  '  N  sinAi.i 


(54) 


is  the  common  grat.i ng- function  frequency  response  of  a  normalized,  uniformly  weighted  discrete  Fourier 
transform  of  N  data,  and 


A^inid  ”  ( N-l  )Aui  +  2A^ 


(55) 


is  the  phase  difference  between  the  two  complex  sinusoidal  components,  reckoned  at  the  middle  of  the 
data  set.  Note  that  there  may  or  may  not  be  a  datum  at  the  middle  of  the  data  set,  according  to 
whether  N  is  an  odd  or  even  integer,  respectively.  Also  note  that  A0  has  not  been  averaged,  but 
rather  is  assumed  to  be  a  fixed  parameter  of  the  particular  set  of  data  being  analyzed.  This  assump¬ 
tion  corresponds  to  the  usual  practical  case,  where  only  one  set  of  data  is  available. 


Again,  it  is  not  easy  to  calculate  the  expected  values  of  the  reflection  coefficients  unless  the 
assumption  of  high  SNR  is  made.  In  that  case  the  same  approximations  can  be  made  as  in  the  derivation 
of  (26)  to  get 

cosAoj  +  j r > ( r ) s i nAu,  +  2  cosA<k  .  .  G( N-l  ,Au>)/(  r+r  )  ! 

•i-o(l.l)'  =  exp(jm  n - - p - y -  (56) 

2  [1+2  cos.M^cosAm  G( N- 1  ,Ai.i)/(rJ+r  c)  +  Og  [  1 +B2 (1 , N ) ] J 

where  the  bias  term  BpO.N)  is  of  order  (N-l )  -  ^  and  is  given  by  eqn.  (Al)  of  Appendix  A. 

Comparison  of  (56)  and  (36)  shows  that  for  high  SNR  ( 1 , 1 )  is  a  biased  estimate  of  >12(1, 1)  unless 
N  »■’,  For  infinite  SNR  and  finite  N,  however,  r^O.I)  becomes  an  unbiased  estimate  of  12(1,1)  if 

covVl'mid  =  0  (57) 

or 

G(N-1,Am)  =  0  (58) 

Similarly,  comparison  of  (53)  and  (35)  shows  that  0 )  is  a  biased  estimate  of  PptO)  unless  either  (57) 
is  satisfied  or 

G(N,iV.)  =  0  (59) 

It  is  impossible  to  satisfy  (58)  and  (59)  simultaneously  for  N  finite,  but  when  (57)  is  satisfied  the 
Burg  spectral  estimate  (14)  is  unbiased  for  M=1  and  infinite  SNR. 


Progressing  now  to  the  second  stage  (M=2)  of  the  Burg  algorithm,  it  is  found  that  the  algebra  be¬ 
comes  all  but  intractable  unless  the  condition  of  infinite  SNR  is  assumed.  For  this  special  case,  (56), 
(52),  (5a-c)  and  (7)  can  be  used  to  derive  the  following  expression  for  82(2,2): 

l-.2(2,2)  =  -exp(  j2',.Q)  '  | i  1  -  2  cosA-i^.j  G(N-2,/,i.,)/(r +r'2) } 

-2  cosA<m.dG( n-l , A... )  fcosA<fm.dG(N-2,Am)[cosAu.+j|-(r)sinAi.,] 

-  2  cosAi./(r;’  +r'2)‘ 

+  cos'.Vin|1dG-’(N-l  ,.v  )  -  {  [cos2,v..+j,  (r)sin2Av.] 

-  2  cos*4'n|.dG(N-2,Au)/(r?+r'2)  1  J- 
/{{l  -  2cos  .VjiidG(N-2,Ao)/(r^r'3)! 


170 


-  2  cos.Vt)  ..  cosAwG(N-l  ,Am)  «■  lcosA4  .  ,G(N-2,Aui)  -  2/(r?+r  ^)\ 
mid  1  mid  j 

+  cos:‘/4m^cjG;’(N-1  ,Am)  *  |l  -  2cosA4|])^  cos2AioG(N-2,Aoj)/(r;'+r  ^)||  (60) 

Examination  of  (60)  shows  that,  even  for  infinite  SNR,  the  "correct"  value  of  -exp(j2ui0)  for  the 
reflection  coefficient  ^2(2, 2)>  is  not  realized  for  N  finite  unless  either  (57)  or  (58)  is  satisfied. 
Realization  of  either  of  these  conditions  for  infinite  SNR  will,  as  examination  of  (60)  show,  cause 
the  magnitude  of  the  reflection  coefficient  to  be  unity.  Thus  it  can  be  inferred  that,  for  sufficiently 
high  SNR,  the  two  crucial  factors  affecting  the  reflection  coefficient  computed  using  the  Burg  algorithm 
are  the  phase  difference  between  the  two  compl ex  sinusoids  at  the  middle  of  the  data  record,  A<|>mid,  and 
the  number  of  cycles  of  difference  frequency  contained  in  the  finite-length  data  record. 

4 .3  Discussion  of  the  Two-Pole  Complex  Sinusoid  Case 

The  results  of  Section  4.1  and  4.2  show  that  even  for  infinite  SMR,  the  first  (M=l)  and  second 
(M=2)  order  PEFs  generated  by  the  Burg  algorithm  are  biased  as  compared  to  the  PEFs  generated  by  the  KA 
technique.  It  is  clear  from  the  appearance  of  the  grating  function  (54)  in  the  equations  (56)  and  (60) 
for  the  first-  and  second-order  reflection  coefficients  that  the  magnitude  of  such  biases  will  have  in¬ 
verse  dependence  on  N,  the  length  of  the  data  record. 

The  effect  of  this  bias  is  to  reduce  the  magnitude  of  the  reflection  coefficient  and  thus  to  allow 
significant  levels  of  uncancelled  signal  energy  to  propagate  beyond  the  stage  M=2  in  the  Burg  algorithm. 
Then  PEFs  of  successively  higher  order  can  be  based  on  this  coherent  "leakage"  signal.  However,  when¬ 
ever  one  of  the  criteria  described  by  (57)  or  (58)  is  satisfied,  no  significant  coherent  leakage  signal 
is  propagated  oeyond  the  stage  M=2.  It  is  conjectured  that  it  is  the  presence  or  absence  of  this  co¬ 
herent  leakage  signal  beyond  the  stage  M=2  that  determine  whether  or  not  line  splitting  will  occur  for 
PEFs  of  some  higher  order.  Results  both  of  previously  published  [7,8]  and  new  simulation  studies 
support  this  conjecture,  as  indicated  in  Section  5  below. 


5 .  Results  of  Some  Simulation  Studies 


In  this  section  are  presented  the  results  of  some  studies  of  the  performance  of  the  complex  Burg 
algorithm  for  the  analysis  of  signals  known  to  be  comprised  of  two  complex  sinusoids  in  the  presence  of 
very  weak  additive  complex  white  noise  ( a^= 77  dB)  .  These  studies  parallel  and  extend  a  set  of  studies 
performed  by  Fougere  et  al  [8]  using  the  real -ari thmetic  Burg  algorithm  to  estimate  spectra  of  a  single 
real  sine-wave  signal  in  the  presence  of  very  weak  additive  real  white  noise. 


It  will  be  necessary  to  make  comparisons  between  complex  data  of  the  form  (52)  and  real  data 
xs(n)  of  the  form 


x$(n)  =  s i n( rv.*s  +  d>s )  +  •  $(n)  n  =  0,1 ....  ,N-1 


(61) 


which  is  comprised  of  N  samples  of  a  real  sine  wave  with  initial  phase  4s  plus  additive  uncorreiuted 
noise  samples  i(n).  The  angular  freguency  <»s  is  given  by 

in  =  2  nf  At  (62) 


where  At  is  the  sampling  interval  (sec)  and  f  is  the  signal  frequency  (Hz).  Equation  (61)  can  easily 
be  rewritten  in  the  form  of  (52)  by  letting  A0=0.5,  r=1,  4D=0,  A.|.=(|.s-n/2,  i.io=0  ant*  Au=ms .  Then  the 
phase  difference  between  the  two  complex  components  of  the  sine  Wave  reckoned  at  the  middle  of  the  data 
record  is,  according  to  (55),  given  by 


A*  .  .  =  ( N - 1  )(•>  +  2*  -1- 

mid  s  s 


(63) 


The  results  of  Fougere  et  al  have  been  extended  by  allowing  the  value  nf  r,  the  ratio  of  the  posi¬ 
tive  freguency  to  negative  frequency  signal  amplitude,  to  range  from  1  to  •••  in  a  series  of  six  steps. 
These  steps  are  denoted  by  the  letters  A-F,  and  the  relevant  signal  parameters  are  summarized  in  Table 
1.  For  all  steps  the  total  signal  power  A^r'+r-Z)  was  maintained  constant  and  equal  to  0.5,  the  power 
of  a  real  unit-amplitude  sine  wave  Also,  the  values  of  <\-o=0  and  ...o=0  were  maintained  for  all  the 
trials.  These  restrictions  do  not  limit  the  generality  of  the  results  obtaineJ.  It  is  clear  that  an 
arbitrary  phase  rotation  of  the  entire  data-set  will  have  no  effect  on  its  power  spectrum,  it  is  also 
clear  that  since  terms  of  the  form  exp(jmi,.0)  can  be  factored  out  of  the  and  f<(m,M)s,  the  effect 

of  non-zero  ',>0  is  simply  to  shift  the  estimated  spectrum  along  the  trequency  axis  (in  a  circular  or  end- 
around  fashion)  by  the  amount  „,0.  Finally,  it  should  be  noted  that  for  each  set  of  cases  examined,  the 
same  set  of  noise-data  samples  was  used  with  all  sets  of  signal  data. 


5.J  Case  1 

The  signal  data  for  Case  1  consisted  of  21  samples  of  two  complex  sinusoids  with  angular  frequen¬ 
cies  i.’i,2='2"/20,  so  that  Ai.i=i,g  .  was  stepped  from  -n  to  +n  in  increments  of  2 .i/9  radians,  so 

that  in’all  instances  cosA|>mid^0.  Ail  spectra  were  estimated  using  (14)  and  a  length  20  (M=19)  Burq 
PEF . 


171 


(64) 


jo(r)  ♦  0.4  cosA$B<d/(r**r"2)^ 

1  ♦  10  [1  ♦  B2(1.6)] 

and  (60)  reduces  to 

t32(2,2)  *  -O  •  0.04  cos?&4B)1d}/{1  ♦  0.04  cos’A*^) 


(65) 


Consideration  of  (64)  and  (65)  shows  that  here  the  Burg  PEF  should  have  greatest  bias  for  r*l,  when 
(r'+r*2)  has  its  minimum  value  of  2,  and  that  the  bias  should  vanish  for  r««,  when  p(r)*l  and  |<82(1.1)>| 
1-4*10*8.  For  the  case  r=»,  (60)  and  hence  (65)  are  not  valid. 


Examination  of  Figs.  9-14  supports  all  these  conjectures.  These  figures  show  orthographic  projec¬ 
tions  of  the  91  spectra,  with  u  ranging  from  -it  to  across  the  page,  and  Afaitd  Increasing  "Into"  the 
page.  The  labelled  vertical  bar  to  the  left  of  the  spectra  Indicates  a  variation  of  20  dB  In  power 
spectral  density.  It  is  clear  that  there  Is  no  line-splitting  for  A^mld"*^/?*  -3ir/2  and  -w/2,  and  that 
the  splitting  shows  a  quasi-cosinusoidal  dependence  on  Admf d  as  suggested  by  the  form  of  (64).  The 
splitting  became  less  sev£re_j5-rJ-*®,  again  as  might  be  Inferred  from  (64),  until  for  Case  2E  the  weaker 
signal  pole  was  correctly  est4ma ted  as  a  single  pole.  Examination  of  the  zeroes  of  the  PEFs^and  the 
residue  powers  showed  measurable  splitting  of  the  stronger  signal  poles  even  for  this  case.  Case  2F  of 
course  showed  no  line  splitting,  since  there  was  then  only  one  signal  pole. 

The  "banding"  effect  visible  most  clearly  In  Fig.  9  (Case  2A)  and  to  a  decreasing  extent  In  sub¬ 
sequent  figures  can  be  explained  on  the  basis  that  when  cosAOmtg^O,  |B?(2,2)|*1  so  that  the  output  error 
power  was  greatly  reduced  in  those  cases.  This  caused  a  shift  In  the  level  of  the  spectrum,  as  can  be 
seen  from  the  dependence  through  (8b)  of  the  numerator  of  (14)  on  this  quantity.  This  also  explains  the 
obvious  drop  in  spectral  level  in  Fig.  14,  where  IfigO.lJM  for  all  values  of  A^id- 

Examfnatlon  of  the  residue  powers  showed  that,  when  line-splitting  occurred,  a  significant  portion 
of  the  signal  power  was  accounted  for  by  each  pole  of  a  split  pair;  In  fact  for  Case  2A  and  |cosA$m{dl 
1,  70?  of  the  signal  powers  appeared  at  the  more  severely  deviated  poles,  and  only  30%  at  less  sev¬ 
erely  deviated  poles.  That  the  greater  portion  of  the  signal  power  was  associated  with  the  more  de¬ 
viated  pole  appeared  for  these  data  to  be  true  In  general.  The  present  theory  makes  no  prediction  as  to 
why  this  should  be  the  case,  although  in  principle  the  theory  for  the  Infinite  SNR  model  could  be  ex¬ 
tended  to  do  so. 


5.3  Case  3 


The  signal  data  for  Case  3  consisted  of  25  sets  of  101  samples  of  two  complex  sinusoids  with  Afaid 
•2’t  in  all  cases.  Again  oj0»0  was  chosen,  and  Am  was  stepped  from  2irx0.0125  to  2nx0.4925  Inclusive  In 
Increments  of  2ttx0.02.  Thus  Case  3A  parallels  Case  4  of  [8],  where  101  samples  were  taken  at  Intervals 
At*0.01s  of  real  unit-amplitude  sine  waves  with  <t>s«ir/4  and  fs  stepped  from  1  25  Hz  to  49.25  Hz  Inclusive 
in  steps  of  2  Hz.  In  all  cases  the  spectra  were  estimated  using  (14)  and  a  length  25  (M«24)  Burg  PEF. 

Figures  15-20  (Cases  3A-F)  show  the  spectral  estimates  obtained  from  the  data.  These  figures  are 
again  orthographic  projections  with  Aw  Increasing  "Into"  the  page.  Any  comments  on  the  detailed  struc¬ 
ture  of  the  line-splitting  shown  would  necessarily  be  speculative,  but  some  general  observations  can  be 
made. 

The  first  remark  Is  that  line-splitting  appears  to  become  less  severe  as  r  was  Increased  In  value, 
as  examinations  of  (56)  and  (60)  might  Infer,  and  In  fact  line-splitting  vanished  for  r=»  as  discussed 
In  Section  5.2.  I 

The  second  remark  concerns  the  possible  dependence  of  the  spectral  level  on  the  values  of  G(N-l,Au) 
and  G(N-2,Ao).  These  values  are  given  for  the  plotted  spectra  In  Table  2.  It  Is  Interesting  to  note 
that  the  minimum  spectral  level  occurred  at  the  minimum  values  for  |G(N-1  ,Au>)  | ,  |G(N-2,Au) |  and  |r.osAu|, 
and  the  higher  spectral  levels  were  observed  when  G(N-2,Aw)<0,  or  Au>2ttx0.25.  Examination  of  (60)  shows 
that  as  Alii  exceeds  the  value  n/2  certain  terms  change  sign  In  such  a  manner  as  to  decrease  the  magnitude 
of  i <2(2,2)  and  thus  perhaps  to  Increase  the  magnitude  of  the  numerator  of  (14)  through  the  relation  (8b). 

Finally,  for  Case  3F  it  Is  again  observed  that  the  spectral  level  drops  as  r  and  the  bias  term  in 
(56)  vanishes,  similar  to  Case  2F.  For  Case  3F  only,- the  estimated  poles  accurately  reflected  the  true 
signal  pole  locations,  were  unsplit  and  had  correct  residue  powers.  For  all  other  ca$es,  examination 
of  the  locations  of  the  poles  of  the  Burg  spectral  estimate  showed  the  existence  of  multiple  poles  with 
significant  residue  power  In  the  vicinity  of  the  true  locations  of  the  two  signal  poles. 


173 


TABLE  2 

Values  of  (*_/2-)  *  100,  G(N-l.v)  and  G(N-2,Au.)  for  data  from  Case  3 


(V./2')  x  100 

fi(*-1,A„) 

G(N-2,Aw) 

COS  V, 

1.25 

C.  1275 

0.1823 

0.9969 

3.25 

0.0493 

0.0488 

0.9792 

5.25 

0.0309 

0.0295 

0.9461 

7.25 

0.0227 

0.0206 

0.8980 

9.25 

0.0182 

0.0125 

0.8358 

11.25 

0.0154 

0.0118 

0.7604 

13.25 

0.0135 

0.0092 

0.6730 

15.25 

0.0122 

0.0071 

0.5750 

17.25 

0.0113 

0.0058 

0.4679 

19.25 

0.0107 

0.0033 

0.3535 

21.25 

0.0103 

0.0024 

0.2334 

23.25 

0.0101 

0.0011 

0.1097 

25.25 

0.0100 

-0.0002 

-0.0157 

27.25 

0.0101 

-0.0014 

-0.1409 

29.25 

0.0104 

-0.0028 

-0.2639 

31.25 

0.0108 

-0.0042 

-0.3827 

33.25 

0.0115 

-0.0058 

-0.4955 

35.25 

0.0125 

-0.0076 

-0.6004 

37.25 

0.0135 

-0.0098 

-0.6959 

39.25 

0.0160 

-0.0126 

-0.7804 

41.25 

0.0191 

-0.0165 

-0.8526 

43.25 

0.0243 

-0.0224 

-0.9114 

45.25 

0.0340- 

-0.0328 

-0.9558 

47.25 

0.0582 

-0.0579 

-0.9551 

49.25 

0.2123 

-0.2142 

-0.9889 

6 •  Summary  and  Conclusions 

The  theoretical  properties  of  autoregressive  spectral  analysis  schemes  have  been  analyzed  when  the 
slqnal  under  Investigation  Is  known  to  be  comprised  of  one  or  two  complex  sinusoids  in  additive  white 
noise.  This  latter  case  Includes  as  a  special  case  data  comprising  a  single  real  sine  wave  In  additive 
white  noise.  It  has  been  shown  that  when  the  autocorrelation  of  the  signal  Is  known,  the  frequency  of  a 
single  complex  sinusoid  can  always  be  extracted.  Independent  of  the  signal -to-nolse  ratio  (SNR),  pro¬ 
vided  enough  samples  of  the  autocorrelation  are  available.  It  was  also  shown  that  the  Burg  algorithm 
correctly  extracts  the  frequency  of  a  single  sinusoid  In  additive  white  no1s6  from  the  complex  ampli¬ 
tude  time-series  data,  provided  the  SNR  Is  sufficiently  high.  , 

The  situation  when  two  complex  sinusoids  are  present  Is  much  more  complicated.  It  was  found  to  be 
fairly  difficult  to  derive  general  equations  describing  tne  spectrum  for  the  known  autocorrelation  ( KA) 
case,  even  for  a  two-pole  autoregressl ve  model.  Nevertheless  these  equations  served  as  a  useful  touch¬ 
stone  for  the  extremely  complicated  Burg  equations  for  the  analysis  of  time-series  amplitude  data.  De¬ 
tailed  theoretical  analysis  showed  that,  unlike  the  KA  case,  the  Burg  spectral  estimate  Is  expected  to 
be  sensitive  both  to  the  number  of  cycles  of  the  difference  frequency  between  the  two  components  con¬ 
tained  In  the  finite-length  data  record,  and  In  particular  to  the  relative  phase  difference  between  the 
two  complex  sinusoidal  components  at  the  middle  of  the  data  record.  Finally,  simulation  results  were 
shown  to  be  fully  compatible  with  the  conjectured  basis  of  line-splitting  presented  here. 

7 .  Acknowledgement 

This  work  is  supportetTT)y''tfie  Department  of  National  Defence  under  Research  and  Development  Branch 
Project  No.  33C69 . 

Appendix  A 

The  bias  term  82(1, N)  found  In  the  expression  for  <P?(1,1)>  Is: 

B2(  1  ,N)  *  (N-l)"1  fl  ♦  t(N-2)/(N-l)]  * 

[cos/,.!  +  j,  (r)s1n.\,  ♦  cosA}m1dcos/wG(N-2,v,.)/(r-'+r*2)] 

[cos/,.  +  J;(r)s1n/V.  +  2cosA*m1(Jces,v.,G(N-l  ,.v.)/(rr*r*?)]j  (Al) 


174 


1. 


S»ffrwce$ 

’ - 1  ■ 

Burg.  J.P. .  "Maxima*  Entropy  Spectral  Analysts',  presented  at  the  37th  Meeting  of  the  Society  of 
Exploration  Geophysicists,  Oklahoma  City,  31  October  1967. 

2.  Burg,  J.P..  'A  Me*  Analysis  Technique  for  Time  Series  Data'.  Presented  at  the  NATO  Advanced  Study 
Institute  on  Signal  Processing  with  Emphasis  on  Underwater  Acoustics,  Enschede,  Netherlands,  1968. 

3.  Burg,  J.P.,  "Maximo*  Entropy  Spectral  Analysis".  Ph.D.  Thesis  Stanford  University,  Stanford, 
California,  May  1975. 

4.  Ulyrch,  T.J.  and  R.W.  Clayton,  "Time  series  modelling  and  maximum  entropy".  Phys.  Earth  Planet. 
Inter..  12,  1976,  pp.  188-200. 

5.  Frost,  O.L.,  "Power  Spectrum  Estimation",  in  'Aspects  of  Signal  Processing,  Part  1",  D.  Reidel 
Publishing  Company,  Dordrecht-Hol land,  1977,  pp.  125-162. 

6.  Narple.  L.A.,  "Conventional  Fourier,  Autoregressive,  and  Special  ARMA  Methods  of  Spectrum  Analysis". 
Engineer's  Degree  Thesis,  Stanford  University,  Stanford,  California,  December  1976. 

7.  Chen,  U.Y.  and  6.R.  Stegen,  "Experiments  with  maximum  entropy  power  spectra  of  sinusoids",  J. 

Geophys.  Res..  79,  No.  20,  10  July  1974,  pp.  3019-3022. 

8.  Fougere,  P.F.,  E.J.  Zawalick  and  H.R.  Radoski,  "Spontaneous  line  splitting  in  maximum  entropy  power 
spectrum  analysis",  Phys.  Earth  Planet.  Inter.  12,  1976,  pp.  201-207. 

9.  Fougere,  P.F.,  "A  soluTiorT  to  the  problem  of  spontaneous  line  splitting  in  maximum  entropy  power 
spectrum  analysis",  J.  Geophys.  Res.,  82,  No.  7,  1  March  1977,  pp.  1051-1054. 

10.  Makhoul,  J.,  "Linear  prediction:  A  tutorial  review",  Proc.  IEEE,  63,  No.  4,  April  1975,  pp.  561-580. 

11.  Makhoul ,  J.,  "Lattice  methods  In  spectral  estimation",  Proceedings  of  the  RADC  Spectrum  Estimation 
Workshop.  24,  25  and  26  May  1978,  pp.  159-173,  AD-A054650. 

12.  Smylle,  D.E.,  G.K.C.  Clarke  and  T.J.  Ulrych,  "Analysis  of  Irregularities  in  the  Earth's  Rotation", 
Methods  in  Computational  Physics,  13,  Academic  Press,  New  York,  1973,  pp.  391-430. 

13.  Andersen,  N.,  "On  the  calculation  of  filter  coefficients  for  maximum  entropy  spectral  analysis". 
Geophysics  39,  No.  1,  February  1974,  pp.  69 -;72. 

14.  Herring,  R.W.,  "A  Review  of  Maximum  Entropy  Spectral  Analysis",  CRC  Technical  Note  No.  685,  June  1977. 

15.  Johnsen,  S.J.  and  N.  Andersen,  "On  power  estimation  in  maximum  entropy  spectral  analysis",  Geophy¬ 
sics  43,  No.  4,  June  1978,  pp.  681-690.  * 


Fig.  1  Network  for  generating  N  complex  Fig.  2  Basic  all -zero  lattice 

sine  waves  In  additive  white  noise  network  (after  [113) - 

from  complex  white  noise  input 
(after  Fig.  4.1  of  [6]). 

\, 


175 


N(|.  11  Case  <?C.  r  -  ,10 


Flos.  9- 


14.  Cast's  2A-F.  Estimated  spectral  power  \ 
Nr6.  M*  5.  A|m id  stepped  from  -5 </2  t( 
(Orthographic  projections.) 


F«q.  17  Case  3C.  r  *  .in  *  Fig.  ?0  Case  3F.  r  « 

Flqs.  15-20.  Cases  3A-F.  Estimated  spectral  power  vs.  anqular  frequency.  A'ffnid  *  2”  N*101 

M  *  24.  .up-..*  stepped  from  2.  x  0.0125  to  ?v  x  0.4925  In  Increments  ofV  x  0.02. 
(Orthographic  projections.) 


178 


A  TWO-DIMENSIONAL  MAXIMUM  ENTROPY 
SPECTRAL  ESTIMATOR 


Sal in  Roucos  \ 

Dept-  of  Electrical  Engineering 
University  of  Florida 
Gainesville,  FL  32611 

D.G.  Childers 

Dept,  of  Electrical  Engineering 
University  of  Florida 
Gainesville,  FL  32611 


ABSTRACT 

Using  the  idea»  from  one-dimensional  (1-D)  maximum  entropy  spectral  esti¬ 
mation,  we  derive  a  2-D  spectral  estimator  by  extrapolating  the  2-D  sampled 
autocorrelation  (covariance)  function.  The  maximum  entropy  method  used  here 
maximizes  the  entropy  of  a  set  of  random  variables.  The  extrapolation  (predic¬ 
tion)  process  under  this  maximum  entropy  condition  is  shown  to  correspond  to 
the  most  random  extension  or  equivalently  to  the  maximization  of  the  mean 
square  prediction  error  when  the  optimum  predictor  is  used.  The  2-D  extrapola¬ 
tion  must  be  terminated  by  the  investigator.  The  Fourier  transform  of  the 
extrapolated  autocorrelation  function  is  our  2-D  spectral  estimator.  Using 
this  method,  one  can  apply  windowing  prior  to  calculating  the  spectral 
estimate. 

A  specific  algorithm  for  estimating  the  2-D  spectrum  is  presented  and  its 
computational  complexity  is  estimated.  The  algorithm  has  been  programmed  and 
computer  examples  .ajse-preBented. 

•  % 

I .  INTRODUCTION 


For  time  series  or  one-dimensional  (1-D)  data,  one  may  consider  the  maxi¬ 
mum  entropy  (ME)  formulation  (1]  as  a  procedure  for  deriving  a  spectral  esti¬ 
mator  such  that  the  entropy  of  the  signal  is  maximized  subject  to  the  con¬ 
straint  that  the  spectral  estimate  is  consistent  with  the  known  autocorrela¬ 
tion  values.  This  spectral  estimator  is  the  same  as  that  derived  by  autore¬ 
gressive  or  linear  predictive  methods  (2).  Some  authors  have  considered  this 
criterion  as  a  smoothing  or  a  whitening  process  [3-8),  an  interpretation  which 
has  been  advanced  for  both  the  ME  and  linear  predictive  (LP)  methods. 

For  1-D  data  the  solution  to  the  ME  spectral  estimation  problem  is 
achieved  via  a  polynomial  spectral  factorization.  However,  for  the  2-D  case 
such  an  approach  is  not  fruitful  since  2-D  polynomials  cannot  in  general  be 
factored,  i.e.,  the  Fundamental  Theorem  of  Algebra  does  not  hold.  For  this 
reason  there  has  been  some  concern  in  the  literature  about  the  existence  of  a 


179 


2-D  ME  spectral  estimator.  This  is  to  say,  while  it  seemed  quite  natural  to 
extend  Shannon's  1-D  entropy  ideas  to  2-D,  there  was  no  assurance  that  a  2-D 
ME  spectral  estimator  existed  and  if  it  did,  was  it  unique? 

Barnard  and  Burg  [10]  originally  hypothesized  such  a  2-D  ME  spectral  esti¬ 
mator,  giving  the  expression  for  the  estimator  and  suggesting  that  it  could  be 
derived  via  a  Lagrangian  multiplier  approach.  Abies  [3]  agreed  with  this  ap¬ 
proach  and  suggested  a  modified  constraint  on  the  2-D  ME  function  which  would 
account  for  nois^-da^a-^  Ponsonby  [11]  attempted  to  derive  the  2-D  ME  spectral 
estimator  following  the  suggestions  in  [3,10]  but  was  unable  to  determine  a 
closed  form  analytical  solution  due  to  the  nonlinear  integral  equations.  An 
iterative  numerical  solution  was  developed  by  Wernecke  and  D'Addario  [5]  for 
application  to  radio  astronomic  data.  Wernecke  [4]  justifies  the  ME  entropy 
reconstruction  model  in  terms  of  its  smoothing  properties.  From  the  viewpoint 
of  image  processing  an  alternate  model  has  been  used  [12-15]. 

The  2-D  ME  spectral  estimator  is  not  the  result  of  a  simple  extension  of 
the  1-D  solution.  In  fact,  Woods  [16]  has  provided  a  constructive  proof  of 
the  existence  and  uniqueness  of  a  2-D  discrete  Markov  random  field  which 
agrees  with  known  correlation  values  in  a  nearest  neighbor  array,  thus  placing 
the  2-D  spectral  estimator  on  a  firm  theoretical  footing.  As  might  be  antici¬ 
pated  the  derivation  and  the  algorithm  are  considerably  more  complicated  than 
that  for  the  1-D  case.  The  corresponding  spectrum  is  the  2-D  ME  spectrum 
[10].  Wood's  algorithm  is  in  some  ways  an  improvement  on  Ong's  [17]  ME  algor¬ 
ithm.  However,  the  computational  load  is  still  quite  extensive  for  even  small 
(3  X  3)  data  arrays  (a  5X5  correlation  array).  The  computation  time  is  de¬ 
pendent  upon  the  degree  of  approximation  used  in  calculating  the  spectrum; 
typical  values  for  Wood's  algorithm  are  5  minutes  and  20  minutes  for  an*  IBM 
360/44  for  two  different  approximations  to  the  maximum  entropy  spectrum  for  a 
5X5  correlation  array. 

In  our  search  to  find  a  more  efficient  2-D  spectral  estimator  we  decided 
to  extrapolate  the  sampled  autocorrelation  function,  considered  as  that  of  a 
set  of  random  variables,  under  the  maximum  entropy  condition  and  then  Fourier; 
transform  the  extrapolated  autocorrelation  function^  to  obtain  the  estimate  of 
the  spectral  density  of  the  random  field. 

Some  of  the  properties  of  our  estimator  are  not  investigated  but  its  exis¬ 
tence  and  uniqueness  under  certain  conditions  are  proven.  The  argument  that 
this  estimator  will  yield  a  high  resolution  spectral  estimator  is  based  on  the 
analogy  with  the  one-dimensional  case,  where  a  maximum  entropy  extension  of 
the  autocorrelation  function  resulted  in  a  better  spectral  estimation  than  an 
extension  of  the  autocorrelation  function  such  as  that  obtained  by  appending 
zeros  to  artificially  extend  the  duration  of  the  autocorrelation  function. 

We  illustrate  our  results  with  computed  examples  and  show  that  our  algor¬ 
ithm  appears  to  be  more  efficient,  computationally,  than  that  of  Woods  [16]. 


180 


II.  OVERVIEW  OF  OUR  APPROACH 


The  1-D  algorithm  does  not  require  that  the  actual  extrapolation  be  per¬ 
formed,  rather  the  spectral  estimate  is  obtained  directly  from  a  system  of 
linear  equations  in  the  known  correlation  values  (Yule-Walker  equations). 

Our  approach  for  the  2-D  case  is  to  formulate  the  problem  such  that  we 
actually  perform  an  extrapolation  of  the  known  2-D  autocorrelation  values.  At 
each  step  of  the  extrapolation,  we  hypothesize  that  an  additional  sample  of 
the  random  data  field  is  available.  However,  its  probability  structure  as  a 
random  variable  with  respect  to  the  previous  set  of  fsamples  is  not  completely 
known.  The  2-D  ME  extension  used  in  the  paper  is  that  the  entropy  of  the  new 
set  of  random  variables  is  maximized  subject  to  the  constraint  that  these  ran¬ 
dom  variables  are  samples  from  a  stationary  random  field.  By  successively 
choosing  the  location  of  the  hypothetical  samples,  we  are  defining  a  one  to 
one  map  between  the  set  of  points  in  the  (x,t)  plane  to  the  set  of  integers. 
This  defines  an  order  for  the  random  variables.  (In  particular  this  order  is 
implicit  in  the  autocorrelation  matrix  of  the  random  variables.)  This  map 
will  be  called  the  spatio-temporal  #"1-D"  extension  path.  The  uniqueness  of 
the  spectral  estimation  with  respect- to  the  choice  of  this  path  i^ unsolved. 

The  autocorrelation  extension  process  is  reduced  to  a  linear  prediction 
model,  in  its  general  sense,  where  the  new  random  variable  in  the  "1-D"  exten¬ 
sion  path  is  predicted  from  the  existing  (known)  random  variables.  The  corres¬ 
ponding  prediction_error  of  the  model  is  maximized  so  as  to  produce  the  most 
random  extension  of  the  autocorrelation  matrix. 


In  the  1-D  case,  once  the  order  of  the  LP  or  AR  model  is  selected,  the 
solution  to  the  linear  difference  equations  is  determined  by  the  initial  condi¬ 
tions.  If  one  attempts  to  derive  a  higher  order  model  from  the  extended  auto¬ 
correlation  matrix,  the  coefficients  of  the  new  model  reduce  identically  to 
the  coefficients  of  the  original  lower  order  model. 


For  the  2-D  case  the  order  of  the  model  is  dictated  by  the  number  of 
available  samples  of  the  random  data  field.  Therefore,  the  model  order 'in¬ 
creases  with  each  extension  step,  yielding  new  prediction  filter  coefficients 
at  each  iteration.  In  contrast  with  the  1-D  maximum  entropy  extension  method, 
the  extension  for  the  2-D  case  must  be  terminated  by  the  investigator. 


Note  specifically  that  the  extension  process  described  here  deals  with 
the  autocorrelation  values  where  as  the  1-D  Burg  technique  [10]  is  applied  to 
the  actual  data. 


A.  Formulation  of  the  Extension 

Let  the  real  random  data  field  be  denoted  as  f(x,t)  such  that  the  process 
is  Gaussian  with  zero  mean  and  stationary  in  time  and  space.  The  autocorrela¬ 
tion  function  is  given  by 

R(p,T)  =  E  [ f ( x , t. )  f(x+p,  t+t)]  (1) 


181 


where  E  denotes  the  ensemble  average.  This  function  is  also  the  autocovari¬ 
ance  since  we  have  assumed  a  zero  mean  process.  This  function  is  symmetric 
and  positive  semidefinite.  We  may,  or  course,  generalize  to  complex  random 
data  fields. 

We  denote  our  matrices  as  ' 


an  =  e 


where  the  ordering  of  A.,  is  arbitrary  in  contrast  with  the  uniform  sampling 
1-D  case.  Because  of  this  the  matrices  are  not  necessarily  Toeplitz  even 
though  the  process  is  stationary.  However,  R..=R..,  thus,  is  symmetric  and 
positive  semidefinite.  ^  ^ 

The  power  spectrum  is  defined  as  [18,19] 

P(f ,k)  =  //R(p,x)e  "27tl(ft‘kx)  dt  dx  (3) 

We  assume  the  2-D  (N  X  N)  covariance  (autocorrelation)  matrix  Aj.,  is 
known  and  obtained  by  sampling  the  spatio-temporal  field  at  [(x. ,t.) 
(x2,t2)...,  (x„,t„)].  We  ask,  what  should  the  extended  covariance  matrix  oe 
if  an  additional  sample  of  the  random  field  at  location  (x„+1 ,  t„+.)  is  avail¬ 
able?  The  stationarity  of  the  random  field  is  maintained  and  tne  entropy  of 
the  new  set  of  {tt^TT'random  variables  is  maximized  with  respect  to  the  unknown 
autocovariance  values. 

We  illustrate  the  extrapolation  by  an  example.  * 

Example 


f(*j,  tj) 

f(x2,  t2) 


f(xN,tN) 


[f(X| , t 1 )  f(x2,t2). . .f(xN,tN)]  (2) 


Imagine  the  random  field  is  sampled  at  four  locations  [(0,0),  (0,1), 
(1,1),  (1,0)]  as  shown  in  Figure  1,  then 


R11  R12  R13  R14 

“r(0,0)  R(0,1)  R(l,l)  R(1,0)  “ 

R21  R22  R23  R24 

R(0,1)  R(0,0)  R(1 ,0)  R(1 ,-l) 

R31  R32  R33  R34 

R(1 , 1)  R(1 ,0)  R(0,0)  R(0,1) 

R4 1  R42  R43  R44 

R(1 ,0)  R( 1 , - 1 )  R(0, 1)  R(0,0) 

We  assume  that  A^  is  known. 

Note  that  an  element  R,  .  of  the  above  matrix  correponds  to  sampling  the 
autocorrelation  function  R(m,n)  at  a  point  dictated  by  the  ordering  of  the 


182 


random  variable  via  the  "1-D"  extension  path.  These  samples  are  shown  in 
Figure  2. 

Now  suppose  an  additional  hypothetical  sample  at  (0,2)  in  the  (x,t)  plane 
is  available  (as  shown  by  the  open  circle  in  Figure  1),  then 


A5  “ 


1 

1 

Rl5" 

1 

R 

A.  1 

4  , 

25 

n 

l 

1 

I 

r35 

*« 

R51  R52  R53  R54 

R55 

R(0,2)~ 

1 

R(0,1) 

1 

1 

R(-l,l) 

1 

| 

R(-l,2) 

X  i 

1 

R(0,of 

(5) 


The  elements  in  the  last  row  are  determined  by  symmetry.  Only  R(0,2)  and 
R(-l,2)  are  undetermined.  ; 


maximize  |  A^|.  Letting  FT  =  (R^j  ,  R^2  ,  R^  i  R54),  we 


The  entropy  of  this  set  of  random  variables  is  log  I  A5I,  hence  we  want  to 

• _  • _ I  a  I  r  r%  *  _  /r\  i-»  r>  n  \  _  1 _ 

53 

A,  ^ 

R(0,0)| 


4 

»T 


K 


=  I  A,  II -rI  A’}  R  +  R(0,0) 


(6) 


T  -1 

Since  A4  is  fixed,  we  need  to  maximize  R(0,0)  -  R  A  4  R  with  respect  to 
R(0,2)  and  R(-l,2).  This  will  be  shown  to  correspond  to  maximizing  the  predic¬ 
tion  error  of  an  optimum  linear  predictor  in  the  next  section. 


III.  THE  "MOST  RANDOM"  EXTENSION 


We  may  illustrate  our  idea  by  reconsidering  the  previous  example,  but  we 
number  the  random  variables  in  the  (x,t)  plane  as  (v.J,  i  s  1 *2,3,4  where  the 
correspondence  between  v.  and  its  location  on  the  6c*t)  plane  is  arbitrary. 
We  let  v5  denote  the  open1  circle  random  variable  at  (0,2).  The  linear  predic¬ 
tion  estimate  of  vs  in  terms  of  the  other  variables  is  \ 


4 

I 

i=l 


aivi 


(7) 


The  prediction  error^is 


5 

e.  =  v.  -  v,  =  2  -a.  v .  where  a.  »  -1 

3  j  D  5 

2  ? 

We  determine  the  coefficients,  a^,  such  that  Efe^]  =  e,.  is  minimized. 


(8) 


183 


By  orthogonality 


ecv.  =  0 
5  1 


=  -la.  v.v.  =  -I  a.R..  ,  i  =  1,2, 3, 4 
j=1J  J  i  j=1  0  U  V 


In  matrix  notation  the  solution  becomes 

A  =  A“J  R  (10) 

—  4  — e 

where  A  is  the  column  vector  (a^  ,  a2  ,a3  ,  ) ,  A4  is  assumed  nonsingular 

and  Rfi  is  the  column  vector  (R15  ,  R2s  ,  R35  ,  R45  ). 

•  1  ► 

T^e  mean  square  prediction  error  when  using  the  optimum  pr'edictor 
A  =  A  ^  R^  is  given  by 

eg  -  R55  -  £  A  =  R(0,0)  -  £  A’’  ^  (11) 

which  gives  the -variance  of  the  error  as  a  function  of  the  autocorrelation 
function  when  an  optimum  predictor  is  used.  Note  that  R  is  not  completely 
known  and  the  optimum  predictor  depends  upon  R  .  Hence  ir  we  choose  Rg  such 
that  the  corresponding  optimum  predictor  has  tlie  largest  prediction  error  of 
all  other  choices  of  R  consistent  with  the  stationarity  constraint,  then  this 
extension  is  exactly"~?.he  maximum  entropy  extension  as  described  in  the  pre¬ 
vious  section. 

IV.  EXISTENCE  AND  UNIQUENESS  OF  THE  SOLUTION  .  . 

Our  method  is  basically  one  of  extrapolation  whicn  we  have  shown  reduces 
to  the  maximization  of  the  determinant,  % 


Vn  = 


R(0,0) 


subject  to  the  constraints  that  some  components  of  R  are  known,  say  m  of 
them.  Rearrange  R  such  that  the  unknown  components  occur  at  the  top  of  R^ 
(R  is  a  column  vector)  to  obtain, 


R(0,0) 


184 


where  the  superscripts  r,u,k  stand  for  rearranged,  unknown,  and  known,  respec¬ 
tively.  This  determinant  is  e  :al  to, 


which  we  wish  „to_ ^maximize  with  respect  to  the  components  of  R^;  since 
e2  =  R (0 , 0 )  R1  A  N  R  i  0  is  required  for  an  autocorrelation  matrix 
extension,  and  Rfo,Oj  is  fixed,  we  have  to  minimize 


where  we  properly  partitioned 


-1 

^  ;  expanding,  we  minimize 


(15) 


T  T  x  ' T 

[RU  ARU  +  RU  B  Rk  +  Rk  C  RU  t  Rk  D  Rk  ]  (16) 

with  respect  to  the  elements  of  R^.  This  gives 

RU  =  -A"1  B  Rk  (17) 

— e  — e 

Hence  RU  exists  and  is  unique  as  long  as  we  can  assume  that  A^  is  positive 
definite.  But  at  any  step  of  the  extrapolation  we  have, 

(18) 

Thus  A^+^  wiH  be  positive  definite  if  the  prediction  error  variance 

R(0,0)  -Ft  A  ^  R^  is  positive.  Therefore  if  the  covariance  matrix  at  step  N 

is  positive  definite  the  extrapolation  will  yield  a  positive  definite  exten¬ 
sion,  provided  the  error  variance  is  not  uull. 


V.  THE  MAXIMUM  ENTROPY  TWO-DIMENSIONAL  SPECTRAL  ESTIMATOR 

An  efficient  computational  algorithm  follows  noting  that  can  be  re¬ 

cursively  computed  from  A  * . 

We  assume  A^  to  be  known  and  positive  definite. 

1.  Compute  A^1,  let  M  =  N. 

Generate  the  position  of  a  hypothetical  new  sample  (assuming  an  extrapola 
tion  path) . 


2. 


3. 


Find  which  components  of  Rg  are  known  by  stationarity . 

"1  r  _  1 

4.  Rearrange  to  obtain  (A^) 

Li  *  T  k  i"  *  1 

5.  Compute  R£  =  -  A  [ BRe ]  where  A  and  B  are  submatrices  of  (A^) 

-1  T  -1 

6.  Compute  A^  Rg  and  the  prediction  error  R(0,0)  -  R£  A^  Rg .  If  the  pre¬ 
diction  error  is  zero,  stop. 

7.  Compute  AM*  according  to  the  recursion  relations*. 

8.  Let  M  =  M+l. 

9.  Is  this  extension  adequate  (i.e.,  is  M  large  enough)? 

If  not,  go  to  step  2. 

10.  Construct  from  the  MXM  autocorrelation  matrix  found  in  step  5  an  LXL 
autocorrelation  array.  This^XL  array  is  the  extended  autocorrelation 
function  with  maximum  lag  of  -g—  in  each  dimension. 

11.  Take  a  two-dimensional  FFT  of  the  LXL  array  constructed  in  step  10. 

Steps  9,  10,  and  11  above  need  some  further  explanation.  Suppose  one  de¬ 
cides  that  the  final  extended  autocorrelation  (covariance)  array  is  to  be  of 
order  LXL;  this  is  the  LXL  array  that  is  transformed  in  step  11.  However,  M 
in  step  9  must  be  much  larger  than  L.  The  reason  for  this  is  that  we  must 
select  the  autocorrelation  values  with  the  appropriate  lags  from  the  MXM 
matrix  to  construct  the  LXL  array.  This  is  best  illustrated  by  examining 

Equation  (4)  again.  Here  A4  is  a  4X4  matrix  with  the  various  R..  entries. 

However,  note  that  the  maximum  autocorrelation  lag  for  this  matrix  is  only 
unity  in  any  one  direction.  If  a  matrix  with  autocorrelation  values  for 
larger  lags  is  desired  then  the  extension  ^pjcjocess  must  be  continued.  In 
general  one  can  convince  oneself  that  M  =  (— £—  +  l)2  and  that  L  is  odd  be¬ 
cause  of  the  symmetry  of  the  autocorrelation  function. 

The  computational  complexity  of  this  algroithm  may  be  estimated  as  fol¬ 
lows  : 

i)  less  than  3M2  operations  are  requiried  at  step  5 

ii)  less  than  2M2  operations  are  required  at  step  6 

iii)  less  than  6M2  operations  are  required  at  step  7 


186 


nupjianty^i^iyurjutip  ■  >■■", 


This  gives  less  than  11M2  operations  per  iteration.  In  the  case  where  v:e 
start  wich  N=5  and  extend  to  a  121X121  matrix,  then  less  than  19X106  opera¬ 
tions  are  needed  not  including  the  FFT  nor  the  necessary  row  and  column  inter¬ 
change  operations.  This  corresponds  to  an  LXL  array  of  21X21.  Tor  the  case 
of  a  3X3  data  (auto  correlation)  array  (a  5X5  covariance  matrix)  the  computa¬ 
tion  time  on  ar  Amdhal  470  was  10  sec  for  extrapolation  to  a  21X21  (LXL)  co- 
variance  array  and  54  sec  for  extrapolation  to  a  29X29  (LXL)  covariance  array 
including  the  appropriate  FFT  calculations.  The  storage  required  is  150  Kbytes 
(single  precision)  which  exceeds  Wood's  56  Kbytes  but  our  program  appears  to 
execute  faster  and  does  require  additional  storage  for  the  FFT. 


We  illustrate  this  algorithm  in  Figures  3,  4,  and  5  where 
the  actual  29X29  (LXL)  covariance  data,  the  5X5  covariance  data 
zeros  only),  and  the  algorithmically  extended  data  are  shown, 
rant  of  the  spectrum  is  plotted  for  convenience.  The  original 
samples  from  two  sinewaves  of  frequencies 


the  spectra  of 
(extended  with 
Only  one  quad- 
data  field  are 


f  =  0.207  Hz,  1  =  0.318  (cm)'1  and 

t  ’  x 

f^  =  0.41  Kz,  f  =  0.11  (cm)  ^  respectively 
The  noise  level  is  o2  =0.8  and  the  signal  power  is  1  for  each  sinewave. 


These  figures  show  three  contour  levels  at  97%,  50%,  and  10°/o  cf  the  nor¬ 
malized  peak  value  in  each  power  spectrum,  Note  that  the  maximum  entropy  ex¬ 
tended  spectrum  has  well  defined  peaks  at  the  50%  level,  i.e.,  the  peaks  are 
easiLy  resolved.  However,  the  peaks  in  the  5X5  covariance  array  (extended 
with  zeros  to  29X29)  are  not  resolved  at  the  50%  (1/2  power)  level. 


VI.  CONCLUSION 


The  proposed  extension  of  the  autocorrelation  £povariance)  function  under 
the  maximum  entropy  condition  is  such  that  at  the  M  step  H+M  samples  (at  the 
locations  specified  by  the  initial  available  autocorrelation  values  and  the 
particular  choice  of  the  extrapolation  path)  of  the  random  field  satisfy  a  sys¬ 
tem  of  partial  difference  equations,  i.e.,  linear  prediction  equations. 
However,  the  prediction  error  is  only  orthogonal  to  th°  samples  involved  in 
the  prediction  and  is  not  necessarily  white.  Whereas  in  the  one-dimensional 
case  the  maximum  entropy  extension  of  the  autocorrelation  function  results  in 
a  predictor  whose  prediction  error  is  orthogonal  to  all  past  samples  of  the 
random  process,  i.e.,  the  error  is  white. 


I 

I 

I 

I 


| 


One  interesting  uniqueness  problem  remains  unsolved,  namely,  the  depen¬ 
dence  of  the  extended  autocorrelation  function  on  the  extrapolation  path.  If 
the  arbitrary  choice  of  an  extrapolation  path  does  not  yield  a  unique  maximum 
entropy  spectrum,  then  it  is  possible  to  determine  constraints  on  the  selec¬ 
tion  of  an  extrapolation  path  which  may  yield  a  unique  spectrum. 


187 


—  )  -■  1  II I  ■  .  . 


■aMaaia  j-. 


REFERENCES 

[1]  J.P.  Burg,  1967,  "Maximum  entropy  spectral  analysis,11  presented  at  the 
37th  meeting  Soc.  of  Exploration  Geophysicists,  Oklahoma  City,  OK. 

[2]  A.  van  der  Bos,  1971,  "Alternative  interpretation  of  maximum  entropy 

spectral  analysis,"  IEEE  Trans.  Inform.  Theory,  Vol.  17,  pp .  493-494. 

[3]  J.G.  Abies,  1974,  "Maximum  entropy  spectral  analysis,"  Astron.  Astro- 

phys.  Suppl.,  vol.  15,  pp .  353-393. 

[4]  S.J.  Wernecke,  Sept. /Oct.  1977,  "Two-dimensional,  maximum  entropy  recon¬ 
struction  of  radio  brightness,"  Radio  Sc.  Vol.  12,  No.  5,  pp.  831-844. 

[5]  S.J.  Wernecke  and  L.R.  D'Addario,  April  1977,  "Maximum  entropy  image 
reconstruction, "  IEEE  Trans.  Computers,  Vol,  C-26,  pp .  351-364. 

[6]  J.  Makhoul,  1975,  "Linear  prediction:  a  tutorial  review,"  Proc.  IEEE, 

Vol.  63,  pp.  561-580. 

[7]  J.D.  Markel  and  A.H.  Gray,  Jr.,  1976,  "Linear  Prediction  of  Speech,"  New 
York:  Spriuger-Verlag. 

[8]  C.E.  Shannon  and  W.  Weaver,  1964,  "The  Mathematical  Theory  of  Communi¬ 
cation,"  Urbana:  The  University  of  Illinois  Press,  pp.  93-95  (paper 

back  edition) . 

[9]  S.  Treitel,  P.R.  Gutowski,  and  E.A.  Robinson,  1.977,  "Empirical  spectral 
analysis  revisited,"  to  appear  in  J.H.J.  Miller  (Ed.),  Topics  in  Nu¬ 
merical  Analysis,  Vol.  3,  New  York:  Academic  Press. 

[Ijj  T. Barnard  and  J.P.  Burg,  May  1969,  "Analytical  studies  of  techniques 

for  the  computation  of  high- resolution  Wavenumber  spectra , "  Advanced 
Array  Research  Report  //9 ,  Texas  Instruments,  Inc.,  (AD  855345). 

['ll]  J.E.B.  Ponsonby,  1973,  "An  entropy  measure  for  partially  polarized 
radia tion  and  its  applicatio n  to  estimating  radio  sky  pola rization  dis¬ 
tributions  from  incomplete  ' Aperture  Synthesis'  data  by  the  maximum  en¬ 
tropy  method,"  Mon.  Not.  R.  Astr.  Soc.,  Vol.  163,  pp.  369-380. 

[12]  B.R.  Frieden,  April  1972,  "Restoring  with  maximum  likelihood  and  maxi¬ 

mum  entropy,"  J.  Opt.  Soc.  Am.,  Vol.  62,  pp.  511-518 

[13]  B.R.  Frieden,  1975,  "Image  enhancement  and  restoration,"  in  T.S.  Huang 

(Ed.),  Topics  in  Applied  Physics,  Vol.  6,  New  York,  Springer-Ver lag , 
pp.  177-248. 

[14]  B.R.  Frieden  and  D.C.  We  L I  s ,  Jan.  1978,  "Restoring  with  maximum  en- 

tropy ,  111 .  Poisson  sources  ami  backgrounds,"  J.  Opt.  Soc.  Am.,  Vol. 
68,  pp.  93-103. 


188 


[15]  R.  Kikuchi  and  B.H.  Suffer,  Dec.  1977,  "Maximum  entropy  image  res¬ 

toration.  I.  The  entropy  expression,"  J.  Opt.  Soc.  Am.,  Vol.  67, 
pp .  1656-1665. 

[  1 6 ]  J.W.  Woods,  Sept.  1976,  "Two-dimensional  Markov  spectral  estimaxtion," 

IEEE  Trans.  Infor.  Theory,  Vox.  IT-22,  pp.  552-559. 

L T 7 ]  C.  Ong,  April  1971,  "An  investigation  of  two  new  high-resolution  two 

dimensional  spectral  estimate  techniques,"  Long  Period  Array  Processing 
Report  //l,  Texas  Instruments,  Inc. 

1 1 8 J  J,  Capon,  August  1969,  "High-resolution  f requency-wavenumber  spectrum 

analysis,"  Proc.  IEEE,  Vol.  57,  pp.  1408-1418. 

[19]  C.S.  Halpenv  and  D.G.  Childers,  June  1975,  "Composite  wavefront  decom¬ 
position  via  multidimensional  digital  filtering  of  array  data, "  IEEE 

Trans,  on  Circuits  and  Systems,  Vol.  CAS-22,  pp.  552-562. 

120]  D.E.  Syrnlie,  G.K.C.  Clarke,  and  T.J.  Ulryeh,  1973,  "Analysis  of  irreg¬ 
ularities  in  the  earth's  rotation,"  in  B.  Adler,  S.  Fernback,  and  M. 
Rotenberg  (Eds.),  Methods  in  Computational  Physics,  Vol.  13,  B.A.  Bolt 
(Ed.)  New  York-  Academic  Press,  Inc.,  pp.  391-430. 

[21]  T.J.  Ulryeh  and  T.N.  Bishop,  February-  1975,  "Maximum  entropy  spectral 
analysis  and  autoregressive  decomposition."  Rev.  Geophys .  and  Space 
Phys.,  Vol.  13,  pp.  183-200 


ACKNOWLEDGEMENT 

This  work  was  supported  in  part  by  the  Naval  Coastal  Systems  Laboratory, 
Panama  City,  Florida. 


-The  resursiou  relation  is 


T 


f  (A"1  R  )  (A'1  R  ) 
l-l  M  -  e  M  -  e 
A^  *  -  •  _  • 

R ( 0 , 0 )  -  R  AmL  R 
-e  M  -e 


-1  _ 

AM+r 


-  (Am1  R  ) 

M  e 


T 


R(0,0)  -  R1  A"1  R 
’  -e  M  -e 


-  K1  R 

M  ~e 


R(0,0)  -  RT  A"1  R 
-e  M  -e 


T  *-l 


R(0.0)  '  K  \l 


-e 


Figure  1 . 


Spatio-temporal 
samples . 


location  of  random  data 


fieli 


P 


Figure  2 . 


Spatio-temporal  location  of  known 
(covariance)  values. 


autocorrelation 


TWO  OlMENSIGNflL  SPECTRUM 


0.0  0.30  0.60  0.90X10° Hz 


Figure  3.  Spectrum  of  two  sinusoids  calculated  by  transforming 
the  truncated  (29X29)  known  spatio-temporal  autocor¬ 
relation  matrix.  Contours  are  at  97%,  50%,  and  10% 
of  the  maximum  peak  spectral  value. 


191 


Figure 


TWO  DIMENSIONAL  SPECTRUM 


I 


0.0  0.30  0.60  0.90X10°  HZ 


Spectrum  of  two  sinusoids  calculated  by  transforming 
the  truncated  (5X5)  known  spatio-temporal  autocor¬ 
relation  matrix  with  zeros  appended  to  extend  the 
autocorrelation  matrix  to  29X29.  Contours  are  the 
same  as  for  Figure  3. 


192 


7*~?rrrr*~ 


two  dimensional  spectrum 


X  1  0° 

0. 90— 


0.60— 


50% 


0.0  0.30  0.60 


0.90X10° HZ 


Spectrum  of  two  sinusoids  calculated  via  the  algo¬ 
rithm  described  in  the  text,  by  extending  the  5X5 
autocorrelation  matrix  to  29X29.  Contours  are  the 
same  as  for  Figure  3. 


Figure  5. 


Spectral  Estimation  and  Signal  Extrapolation  in  One  and  Two  Dimensions 


Anil  K.  Jain* 

Signal  and  Image  Processing  Laboratory 
Department  of  Electrical  and 
Computer  Engineering 
University  of  California 
Davis,  California  95616 


ABSTRACT  ,  • 

In  this  paper,  we  consider  extrapolation  and  spectral  estimation  of  dis¬ 
crete  time  (or  space)  signals  in  one  (or  two)  dimensions.  The  paper  is 
divided  into  two  parts.  In  the  first  part,  we  present  some  rgcent  results 
[6]  for  extrapolation  of  band! i mi  ted  discrete  time  signals.  'These  results 
show  the  relationship  between  several  recently  reported  extrapolation  algor¬ 
ithms  by  Papoulis,  [1],  Sabri  and  Steenaart  [2],  Cadzow  [3]  and  others  [4,5] 
and  also  yield  some  new  algorithms.  In  the  second  part^'first  we  show  how 
the  one  dimensional  algorithms  could  be  extended  to  tv^ef  dimensions .  Then 
we  introduce  a  two-dimensional  semi-causal  predictiocs' algorithm  Lor  spectral 
estimation  of  discrete  random  fields.  This  algorism  requires  solution  of 
linear  equations  and  realizes  a  particular  minimum  variance  ARMA  model  for 
the  spectral  estimate.  It  gives  superior  resolution  compared  to  two-dimen¬ 
sional  (FFT  based) periodograms  or  the  two-dimeiisional  and  autoregressive  (AR) 
spectral  estimates. 

PART  ONE:  EXTRAPOLATION  OF  DISCRETE  TIME  BANDLIMITED  SIGNALS 

—  -■  — - y - — — ■■  ~~  ■  ^  -  - 

1 . 1  Problem  Definition 

A  discrete  signal  y(ldr,  k  =  0,  +  1 ,  +  2,. .. ,  is  called  band! imi ted  if 
its  Fourier  transform  f 


4  £  OO 

Y(/)  -  l  y(k)exp(-j2irkf) ,  -  p  i  f  i  i  0) 

k=-«> 

satisfies  the  relation 


*Research  supported  in  part  by  the  Army  Research,  Office,  Durham  N.C.  under 
grant  No-  DAAG29-78G0206  and  in  part  by  RADC  under  a  Multi -effort  Post  Doc¬ 
toral  program.  Paper  presented  at  RADC  Spectrum  Estimation  Workshop,  Rome, 
N.Y.,  Oct.  1979. 

f 


195 


(2) 


Y(f )  »  0,  1  >  Jf  1  >a 

This  implies  y(k)  comes  from  bandlimited  continuous  signal  which  is  over¬ 
sampled  with  respect  to  its  Nyquist  rate.  This  occurs  quite  often  when  a 
system  observes  signals  over  a  wide  bandwidth.  We  are  given  a  set  of  time 
limited,  noise  free  observations 


z(k) 


|y(k) 

lo 


-M  <  k  <  M 

otherwise. 


(3) 


Given  { z ( k ) } ,  the  problem  is  to  find  an  estimate  of  y(k)  outside  the  interval 
[ -M , M] .  We  define  the  infinite  vector 


y  3  [...y(-k)...y(-l),y(Q),y(l),...,y(k)...] 


(4) 


a  bandlimiting  operator  L,  and  a  time-limiting  operator  W,  as  infinite  ma¬ 
trices  * 


«i.j>  •  *i.j  *  '-i  ■  o-t’-i2- 


r 


w=  {wi,j}  ’  wi/^ 
/ 


1 ,  i=j ,  -M  <  i ,  j  <  M 
0  ,  otherwise 


(5) 


(6) 


1 .2  Properties  of  L 

1.  Let  S  be  a  (2M;Kl)x°°  matrix  operator  whose  elements  are 

'1,  j=j=±l,±2,...,+M 

% 

0,  otherwise 


(7) 


Basically,  S/maps  (2M+1 )  elements  from  an  infinite  vector  into  a  finite 

vector.  Considfi^  the  (2M+1  )x(2M+l )  matrix  L  -  SLST  and  the  matrix  S**S  -  W. 
The  operator  w/replaces  the  elements  outside  of  an  infinite  yector  by 


zeros .  s"^"  extrapolates  the  elements  of  a  (2M+l)xl  vector  by  zery/. 


2.  /he  operator  L  is  idempotent,  i.e.,  L  -L. 


/(B) 


3.  ^  For  every  M  <°°,  L  is  positive  definite.  Moreover,  all  the  eigenvalues 
of  L  lie  in  the  interval  (0,1)  i.e.,  0  <  X ( C )  <1,M<°°. 

For  proofs  of  these  and  other  properties  see  Jain  and  Ranganath  [6]. 

1 . 3  Matrix  Formulation  of  the  Extrapolation  Problem 

Lety(k),  k=0,+l....be  a  discrete-time,  bandlimited  signal  as  defined  in 

(I)  and  (2).  If  z  denotes  a  (2M+l)xl  vector  of  the  observations,  then  z  =  Sy. 
Since  y  is  bandlimited,  it  must  satisfy  Ly  =  y,  so  that  we  can  write 

z  -  SLy.  (9) 

Now,  if  we  define  an  ®  x  (2M+1 )  matrix 

H  =  SL  (10) 

then  (9)  becomes 

z  -  Hy  (11) 

where  z  is  a  (2M+l)xl  vector  and  y  is  an  infinite  vector.  The  extrapolation 
problem  is  now,  simply,  to  find  an  estimate  of  y  given  z.  Equation  (11)  as 
such  does  not  have  a  unique  solution  because  H  is  a  rectangular  matrix.  In 
other  words,  for  discrete  bandlimited  signals  given  over  a  finite  interval, 
if  is  not  possible  to  extrapolate  uniquely.  One  way  to  make  the  solution  of 

(II)  unique  is  to  look  for  minimum  norm  least  squares  (MNLS)  solution  defined 
as 

y+  =  min-f||  y  ||2;  HTHy  -  HTz]  ,  l|x|[2=  xTx 

T  T  ^  ■+- 

Note  tnat  a  solution  of  H  Hy  -  FI  z  minimizes  the  least  squares  error  ||z-Hy||.y 

is  that  least  squares  solution  which  has  the  minimum  norm.  Conceptually,  the 

problem  is  quite  straightf orward  now  and  a  large  number  of  algorithms  are 

available  to  find  y+.  Using  the  definition  of  H  [see(10)]  and  the  properties 

of  L,  these  algorithms  assume  simple  form  in  many  cases  and  will  be  briefly 

stated. 

We  note  that  a  continuous  bandlimited  signal  given  over  any  finite  interval 
is  analytic  and  can,  therefore,  be  extrapolated  uniquely  to  its  original 
values  outside  the  given  interval. 

1.4  Iterative  Extrapolation  Algorithms:  Let  y  represent  an  estimate  of  y  at 
nth  interation.  Following  [6],  we  can  write  down  the  gradient  and  the 
conjugate  gradient  algorithms  as  follows. 


1.4.1  The  Gradient  Method 


n+1 


+  cM  ( z  -Hy n ) , 


0  <  £  <  W„ax<HTH) 


02) 


197 


Using  (10)  and  the  properties  of  L,  this  equation  simplifies  to  give 

yn+]  =  ef]  +  (I-eLW)yn,  yQ  =  0;  f}  $  HTz  =  LSTz  (13) 

Convergence  is  achieved  for  0 <e<2<2/A  (L).  Now  it  can  be  shown  by  indue- 

max 

tion  that  since  f^  is  bandlimited  i.e.,  Lf-j  =  fp  each  yp  is  also  band- 
limited.  Hence,  (13)  can  be  written  as 

yn+1  =  ef,  +  L(I-eW)yn  (14) 

For  e=l ,  this  becomes  the  discrete  version  of  Papoulis  iterative  algorithm 
[1]  reported  in  [2].  However,  the  convergence  would  be  best  (for  constant 
e)  if  we  let 

e  =  c  ,  =  2/((X  .  (HTH)  +  X  (HTH))  =  Z/X  (L)  >  2  (15) 

opt  min  max'  "  max  -  '  ' 

Further  improvement  in  convergence  is  obtained  if  we  go  to  the  steepest  descent 
[6]  or  the  conjugate  gradient  algorithm.  Thus,  it  is  seen  that  Papoulis1 
method,  based  on  a  successive  energy  reduction  method  (see  Gerchberg  [4]), 
is  a  special  case  in  the  class  of  one  step  gradient  methods.  It  is  easily 
shown  that  yn  converges  to  y+  [6], 

1.4.2  Conjugate  Gradient  Method 

o 

Using  the  property  L  =  L,  the  gradient  vectors  {gk)  and  the  conjugate 

direction  vectors  {d^}  can  be  shown  to  be  bandlimited  and  this  algorithm 
becomes 

\ 

yk+l  =  yk  +  akdk  ’  y0  =  0 

dk+l  =  “9k+l+3kdk  *  d0  =  "go 

gk  =  LWyk  -  LSTz=  gk_1+<V1LWdk-l 

(16) 

M  /  M  9 

6k  =  l  gK1(n)dk(n)/  \  (dk(n))2 

ak  =  I  dk(n)  (yk(n)-z(n))/  \  (dk(n))2 

k  n=-M  K  K  /  n=-M  K 

This  is  a  two  step  gradient  algorithm  and  iteration  by  iteration,  has  better 
convergence  than  the  ordinary  gradient  method  discussed  earlier. 


198 


1 . 5  Generalized  Inverse  Extrapolation  Filter 
The  generalized  inverse  of  H  is  given  by 

H+  =  hW)'1  =  LST(SLST)-‘  =  LSTL-1  (17) 

/s 

which  exists  since  L  is  nonsingular.  Hence  we  can  directly  evaluate  the 
MNLS  estimate  of  y  as 

y+  =  H+z  (18) 

A 

In  practice,  L  could  be  quite  ill-conditioned  depending  on  o  and  M  and  has 
to  be  stabilized  [6].  This  can  be  done  by  either  using  a  singular  value 
expansion  in  which  terms  corresponding  to  small  eigenvalues  of  C  are  dis¬ 
carded  or  by  adding  a  small  positive  quantity  to  the  diagonal  terms  of  C 
in  (17).  The  generalized  inverse  of  (17)  is  called  the  Extrapolation  Matrix 
in  the  context  of  signal  extrapolation.  It  has  appeared  in  the  context  of 
image  restoration,  see  e.g.,  Helstrom,  Rino,  Jain  and  others  [7-9].  Recently, 
it  has  also  been  derived  by  Cadzow  by  a  different  procedure  for  signal  ex¬ 
trapolation.  Another  extrapolation  matrix  (whose  size  is  infinite,  if  we 
extrapolate  the  signal  to  infinity)  has  been  suggested  by  Sabri  et  al  [2].  In 
[6]  we  show  that  this  matrix  does  not  exist  and  its  finite  approximation  (used 
in  [2])  is  ill-conditioned. 

1 . 6  Discrete  Prolate  Spheroidal  Wave  Functions  and  Singular  Value  Expansion 

It  is  known  that  a  continuous  band-limited  signal  can  be  extrapolated 
outside  its  observation  interval,  exactly,  via  the  PSWF  expansion  [10].  In 
the  case  of  discrete  signals,  a  siiriTlar  expansion  is  possible  when  we  con¬ 
sider  the  minimum  norm  least  squares  extrapolated  estimate  via  the  singular 
value  expansion  of  the  matrix  H  [6]. 

Papoulis  and  Bertram  [11],  Slepian  [5]  Jain  and  Ranganath  [6],  Algazi 
[12],  and  others  have  studied  the  properties  and  applications  of  these  func¬ 
tions  in  digital  signal  processing.  For  singular  value  decomposition  of 

T  T 

H  =  SL,  we  consider  the  eigenvalue  problems  associated  with  H  H  and  HH  ,  i.e., 

LWL<j>k  =  Ak<j>k,  (19) 

SLs\'k  =  A^,  -  M  <  k.  <  M  (20) 

where  Ak  >  0,  and  {<}>k}  are  °°x  1  and  {t^}  are  (2M+l  )xl  orthonormal  vectors.  It 
can  be  shown  that  <f>k  must  be  bandlimited  vectors ,( i .e. ,  L<j)k  =  <f>k),  and 
are  related  to  ^k  via  the  relations 


1 


(22) 


^k  =  LS\ 

Equation  (21)  states  that  (2M+l)xl  vector  \p,  is  simply  obtained  by  selecting 

K  -1/2 

the  (2M+1)  elements  {^(m),  -M  £  m  £  M}  of  4^  and  scaling  them  by  '  . 

Equation  (22)  is  remarkable  in  that  the  °°xl  vector  <)>,,  is  obtained  by  simply 

K  -1/2 

low  pass  filtering  the  sequence  {^(m)}  and  scaling  the  result  by  '  . 

This  means  4^  is  the  extrapolation  of  4^,  obtained  by  simple  low  pass 

filtering  and  scaling.  Also  noteworthy  is  the  fact  that  the  sequence 
{(^(m),  -°°<m<_  «>}  is  orthogonal  over  the  interval  -M  £  m  <_  M  as  well  as 

ever  the  infinite  interval.  This  property  is  similar  to  that  of  the  contin¬ 
uous  PSWFs.  The  extrapolated  signal  is  obtained  by  writing  the  singular 
value  expansion 


H+  '  j.M^  W  («> 

which  gives  y"1  =  H+z,  as 

+  M  ak  T  M  (24) 

y  (in)  =  l  —  4>k(m),  a,  =  i|;  'z  =  l  <|>k(m)y(m) 

k=-MA^  K  K  K  m=-M  K 

It  is  easy  to  check  that  y+(m)  =  y(m)  for  me[-M,M]. 

1 . 7  Mean  Square  Extrapolation  Filter 

In  the  presence  of  additive  noise,  uncorrelated  with  y,  (15)  is  modified 
to  give 

z  =  SLy  +  n  =  Hy  +  n  (25) 

where  n  is  the  (2M+l)xl  noise  vector.  Now  we  look  for  the  best  linear  mean 
square  extrapolation  of  z  and  is  given  by  the  Wiener  filter  estimate 

y  =  RLTST(SLRLTST  +  Rn)_1z  (26) 

where  R  and  are  the  autocorrelation  matrices  of  y  and  n  respectively. 

Since  (y  ( k ) }  is  a  bandl  invited  signal,  LRLT  =  L[EyyT]LT  -  E[(Ly)(Ly)T  = 

T 

Ey  y  =  R.  Hence,  the  above  equation  becomes 

y  =  RLST(SRST  +  Rn)_1y 


200 


•s 

K 


In  the  worst  case  when  we  do  not  know  R,  one  may  set  R  =  L.  Now,  if  R  +0, 

~  +  * 
y+y  ,  the  MNLS  extrapolated  estimate. 

1 . 8  Recursive  Extrapolation 

Now  we  present  a  recursive  least  squares  algorithm  based  on  Kalman 
filtering  techniques  where  the  extrapolated  signal  estimate  is  updated 
recursively  as  a  new  observation  sample  arrives.  From  (25),  the  k'th  obser¬ 
vation  z(k)  can  be  written  as 


z(k)  =  hjy  +  nk  k  =  0,  1,...  (27) 

where  hj  is  the  k^  row  of  L  and  n^  is  zero  mean  white  Gaussian  noise. 

The  state  equation  for  the  unknown  extrapolated  vector  y  can  be  written  as 

yk+1  =  V  cov<yo>  -  Po  =  L  -  fak-£J  (28) 

The  Kalman  filter  associated  with  equations  ( 27 ) - ( 28 )  is  called  the  recursive 
least  squares  filter  and  is  given  by 


Vl  =  yk  +  9k<z<k>-h)V'  y0  *  0 


(29) 


where  y^  is  the  k 


th 


estimate  of  y  anu  g^ 


is 


the  Kalman  filter  gain. 


Using  the 


properties  of  L,  the  Riccati  equation  which  is  associated  with  the  Kalman 
filter  can  be  simplified  considerably.  Details  are  given  in  [6], 


Examples : 

Let  the  observations  model  be 

z(k)  =  sin  (.099nk)  +  sin  (.085nk)  +  n^>  -  8  £  k  <  8 

It  is  known  that  the  spectrum  of  the  signal  lies  in  the  interval 
[ - . 1 , . 1 ]  i.e.,  o=0.1.  Figure  1  and  2  show  the  original  signal  y(k)  and  the 
observations  when  there  is  no  noise.  Figures  3,  6,  and  4  respectively  show 
the  extrapolated  estimates  obtained  by  the  Papoulis  (30  iterations),  the  con¬ 
jugate  gradient  (10  iterations),  the  generalized  inverse  algorithms.  In 
theory,  these  algorithms  are  equivalent.  However,  due  to  differences  in  their 
numerical  properties,  the  results  are  different.  Faster  convergence  of  the 
conjugate  gradient  method  is  evident  from  Figs.  3  and  6.  Although  the  gen- 


201 


asa^ac » ■ 


■■ii 


A 


eralized  inverse  of  (17)  always  exists,  it  exhibits  instability  (Fig.  4) 
due  to  ill  conditioning  of  L.  However,  the  stabilized  inverse  of  (24) 
improves  the  extrapolated  estimate  (Fig.  5)  greatly.  When  the  observations 
contain  small  noise  (SNR-21. 6dB)  performance  of  these  algorithms  is  consider¬ 
ably  degraded.  For  example,  Fig.  7  shows  the  result  of  the  conjugate  gradient 
method.  This  is  not  unexpected  because  the  noise  was  not  accounted  for  in  the 
extrapolation  algorithm.  The  mean  square  extrapolation  filter  (Fig.  8) 
greatly  improves  the  result.  Application  of  these  algorithms  in  spectral 
estimation  radar  signal  processing  are  considered  in  [6], 

PART  TWO:  TWO  DIMENSIONAL  EXTRAPOLATION  AND  SPECTRAL  ESTIMATION 
2 • 1  Extrapolation  of  Bandl imlted  Sequences 

The  results  of  p:;rt  one  can  be  easily  extended  to  a  two  dimensional  band- 
limited  sequence  y (m,n)  which  is  known  over  a  finite  observation  window  say, 
K-M,M]x[-M,M].  Then  using  Kronecker  products  and  mapping  y(m,n)  into  a 
lexicographically  ordered  vector  y(k),  we  can  write  an  equation  analogous 
to  (9)  as 


1  *  SLy,  S  --  S&S,  L  =  L®L  (30) 

The  properties  of  the  two  dimensional  low  pass  operator  L  can  be  deter¬ 
mined  and  the  two  dimensional  version  of  the  various  algorithms  can  be  de¬ 
rived.  For  example,  the  gradient  algorithm  corresponding  to  (13)  becomes 
(after  using  properties  of  Kronecker  products  of  matrices) 

V'l  *  dTl  +  Yn  "  L;LWYnWL>  |ri  “  LSTZSL,  Yq  =  0  (31) 

where  L  is  the  matrix  of  observations  on  [-M,M]x[-M,M] ,  and'  Yp  is  the  («>xo'>) 
extrapolated  MNLS  estimate  of  Y  at  iteration  n.  This  algorithm  is  simple  in 
that  it  requires  separable  row  by  row  and  column  by  column  operations.  De¬ 
tails  of  this  and  other  two  dimensional  version  of  the  foregoing  algorithms 
will  appear  elsewhere. 

2 . 2  Two  Dimensional  Maximum  Entropy  Spectral  Estimation 

For  a bandl imi ted  sequence,  a  spectral  estimate  can  be  obtained  from  the 
Fourier  transform  of  the  extrapolated  signal.  The  foregoing  extrapolation 
methods  become  inapplicable  when  sampling  is  done  at  the  Nyquist  rate.  For 
one  dimensional  signals,  autoregressive  (tAR)  or  equivalently  the  maximum 
entropy  (ME)  method  have  been  found  useful  in  obtaining  high  resolution 
spectral  estimates.  In  this  case,  the  solution  of  the  nonlinear  ME  spectral 
estimate  equations  reduces  to  an  equivalent  set  of  linear  Toeplitz,  AR 
equations  which  are  easily  solvable.  This  equivalence  between  the  ME  and  AR 
methods  is  due  to  the  existence  of  spectral  factorization  theorem  in  one 
dimension.  In  two  dimensions  ,  if  we  are  given,  say,  the  autocorrelations 
{ r  1  on  a  rectangular  window  W*  =  [-p,pjx[-q,q],  the  ME  estimate  of  the 

III  )  f  I 

spectral  density  function  (SDF)  must  be  the  form 


»i  iimn  in 


rttffei 


202 


s(z1 »zo )  - 


i  l  l 

Lm.neW* 


,  in  n 
Jm,n2l  2 2 


where  {am  „}  must  be  determined  from  {r  }  given  on  W*.  Now,  if  (r  J 
m,n  m,n  m»n 

is  a  positive  definite  sequence  on  W*.  it  is  not  sufficient  to  guarantee  a 

positive  ME  spectral  estimate.  In  other  words,  che  array  {r...  n)  defined  on 

W*  need  not  have  any  positive  definite  extension  [14,15].  Besides  existence 
difficulties,  the  nonlinear  problem  of  determining  a  can  no  longer  be  re- 

III  %  1 1 

duced  to  a  linear  problem  (as  in  1-D  case)  of  autoregression.  This  is  be¬ 
cause  factorization  of  a  two  dimensional  rational  SDF  as  a  product  of  two 
complex  conjugate  rational  functions  is  not  always  possible.  Hence,  if  the 
ME  solution  existed,  one  would  generally  resort  to  iterative  methods  e.g., 
see  [13].  Unfortunately,  there  are  convergence  difficulties  and  the  results 
do  not  seem  to  be  very  attractive. 

2 . 3  Discrete  Random  Fields  and  their SDFs 

An  obvious  extension  of  one  dimensional  AR  spectral  estimation  method  is 
to  assume  that  the  SDF  to  be  estimated  has  a  stationary  random  field  realiza¬ 
tion 

ui,j  =  JJ/ni.n  ui-m,j-n  +  ui,j;  Eui  ,jui+k,j+f.  ^ 
where  l'u,  J  is  a  two  dimensional  discrete  random  field  and  S  is  a  set  of 

1  >  J 

suitably  chosen  index  pairs  (m,n).  We  define 

JJ.JVn  U1-n.,j-n6  UivJ  (34) 

as  prediction  estimate  of  u,  .  and  c.  ,  is  the  prediction  error.  Wo  consider 

•  >J  1  *J 

three  types  of  predictors  characterized  by  S  as  (see  Fig.  9) 


ln>l  .V/iiOUfri-OjiiPl}  :  Causal  model 
ln>l  ,V  in)  U  {n=0  ,V  in  f  0};  Semi  causal  model 
I'V  ( ni , n )  /  (0,0)3  ;  Noncausal  model 


(35a) 

(35b) 

(35c) 


The  "causal  model"  (35a)  defines  causality  in  the  sense  of  raster  scanning 
the  field  l'u,  ■}  column  by  column.  Often  one  is  interested  only  in  represen- 

tations  where  a  are  non-zero  only  over  a  finite  window  W,  called  the  pre- 
in  »  n 

diction  window,  which  is  a  subset  of  S.  In  that  event  (33)  is  a  stochastic 


tThis  was  pointed  out  to  the  author  by  B.  Dickinson  [15]. 


203 


difference  equation  which  realizes  the  rational  SDF 


Vv‘2>  ■  <36> 

where  SE  Is  the  SDF  of  {e,^}.  In  general,  {e._.)  could  be  a  moving  average 
2,4  fTsfe^TTrS^4^  (vlRS:):  If>  for  a  given  W  the  random 
m.nrf'^hW;‘5esU  PreSe"tat10nrequ1res  that  Ee1.Ju1-m,J-n-°-  "'«"<>«'• 

Hk,l)  '  =  8\<*,0  1  k^50  <37) 

where  S0  -SUCO.O).  Defining  .  . 1  and  WQ  =  WU[0,0),  and  mapping  the 

arrays  fa|||jn),  (um>[)},  defined  on  W0  into  vectors  a  and  u  respectively  (37) 
reduces  to  the  equation 

Ra  =  -a2l  or  a  -  -(i2R"1il_  =  -p2b  (38) 

10 

Jaw  ?  ll  C0.Vi?Hance  matrix  of  the  vector  u.  The  vector  1  takes  a 
value  1  at  a  location,  say  iQ,  which  corresponds  to  the  (0,0)  Vocation  in  the 

window  WQ;  and  b.  is  the  iQth  column  of  R"1.  Since  aQ  -  -1 ,  one  obtains 
from  (30)  0  u,u 

O2  ■  1^b10<i0>  (39) 

S  will!  Cfo7^eandfT  (3S1'  ThU5>  "  need  tte 

2 . 5  Causal  Models  and  AH  Spectral  Estimation: 

A  common  example  of  causal  prediction  is  to  consider  WQ=[0,p]x[0,q] 

which  gives  rise  to  a  (single  quadrant)  predictor,  causal  in  each  dimension 
he  orthogonality  condition  for  these  models  requires  e.  .  to  be  a  white  noise 

field  so  that  Se=6  and  the  spectral  density  function  is'J 

su  =  82/|1'mywVnzr"Vn|2^,l=  1.  U2M  (40) 

where  fa^l  ere  obtained  via  (38)  and  (39)  with  1„.l  and  R  is  a  (p+1  )x(p+l ) 

block  loeplitz  matrix  of  basic  dimension  (q+l)x(q-l)  i.e.,  R  =  {r.  ,}, 

*  J 


204 


0  5  i»J  1  P*  where  =  { r  ( k ,  i  -  j ) }  ,  0  >_  i,  j  >  q.  Comparison  with  (32) 

shows  that  the  SDF  estimated  via  (40)  is  of  the  form  of  (32)  since  we  can 
write  the  denominator  polynomial  in  (40)  as  in  (32)  via  the  relation 


“iiu-n 


■  f  i« 

1*0  j=0 


i +m, j+n ’ 


m,neW* 


where  a..  .  -  0  if  ( i » j ) Wq  and  W*  is  the  window  [-p,p]x[-qxq].  The  element 

of  the  block  Toeplitz  matrix  R  are  defined  on  this  window.  However,  in  gen¬ 
eral,  the  coefficients  (a  1,  (and  the  SDF)  obtained  via  ( 38) - (41 )  are  not 

in  ,n 

the  quantities  that  one  would  obtain  by  solving  the  ME  nonlinear  equations. 
Alternatively ,  the  covariances  realized  by  the  random  field  of  (33)  whose 
(a  }  are  determined  via  ( 38 ) & ( 39 )  (from  a  given  positive  definite  R  ob- 
taHrted  from  given  autocorrelations  r(k,£)  on  W\),  need  not  match  exactly  the 
given  values  r(k,£)  on  the  window  W*.  If  the  two  sets  autocorrelation  matched, 
then  (36)  would  also  be  the  ME  spectrum. 


Example. 


As  an  example,  consider  the  autocorrelation  model. 


r(m,n)  »  cos2ii(0.0bwi+U. 2n)  +  0  6cos2ii(0,2ni+0,05n)  +  0.256’  ,q6,q  (42) 

The  SDF  lias  two  delta  functions  /it  (.Ob,  .2)  and  (.2,  0.06)  in  the  positive 
quadrant  of  the  frequency  plane.  Assuming  r(m,n)  are  available  on  a  5x5 
window  W*  =  [-?,2]x[2,2]  (p=q=2),  the  spectrum  estimated  according  to  (40) 
is  shown  as  a  contour  plot  in  Mg. 10.  The  two  original  peaks  have  merged  in¬ 
to  a  single  peak  at  roughly  half  way  between  those  peaks.  Increasing  the 
size  of  the  observation  window  to  7x7 orhigherdid  not  improve  the  resolution. 
This  is  so  because  many  covariance  functions  may  not  be  realizable  even  by 
an  infinite  order  (p-q-w)  single  quadrant  causal  models.  Other  causal 
structures  such  as  non- symmetric  half  plane  models  may  improve  results,  but 
the  order  of  the  model  required  to  achieve  desired  resolution  may  be  pro¬ 
hibitively  high. 

2.6  Spec t raj  Estimation  via  Semi  causal  Models 


In  the  case  of  semi  causa  7  models,  the  prediction  window  selects  samples 
which  are  in  the  past  in  one  of  the  directions  and  in  the  past  as  well  as  future 
in  the  other  direction.  Correspondingly ,  the  model  allows  prediction  on  a 
symmetric  hajf.  plane.  As  an  example,  let  the  prediction  win¬ 

dow  b71-p,pJxT0'7qT.  The  maximum  variance  condition  applied  at  rpO 

requires  Ec.  .u.  .*=0,  V 111  t  0,  which  yields  the  condition 


1-  l  ^ 

IIP  1  ' 


i,0^i-m,0+6i+m,o) 


Hence  the  SDF  of  <: .  .  is 

'  *  J 


205 


s 

£ 


(z 


■^Z^) 


(44) 


This  implies  (e.  ..}  is  a  moving  average  field  (in  the  1  i 1  variable)  and  is  a 

'  J  J 

white  noise  field  in  the  ' j '  variable.  The  SDF  realized  by  this  model  is  of 
the  form 


SuU;,Z2)  - 


1-  I  I 


m.neW 


a  z  “mz 
m,n  1 


-n ,  2 


=  1,  IzJ-l 


(45) 


which  in  view  of  (43),  is  not  necessarily  an  "all-pole"  model  (unless  Se  is  a 
factor  in  the  denominator  polynomial).  Corresponding  to  (38)  and  (39)  here 
1  o  =  P+1 »  R  Is  a  (2p+l  )x(2p+l )  block  Toeplitz  matrix  of  basic  dimension 

(q+1)x(q+l)  i.e.,  1?  =  {R.._.},  -p<i,j<p;  Rk  =  {r(k,i-j)J,  0<i,j<q,  and  a  and  1_ 

are  vectors  of  size  (2p+l)(q+l).  Thus  the  elements  of  R  are  defined  over  a 
window  W*=[-2p,2p]x[-q,q]. 

Some  interesting  and  important  facts  about  the  foregoing  semicausal 
model,  are  in  order  (1)  Comparison  with  (32)  shows  that  (45),  in  general,  is 
i inum_ en o joy  spectrum,  since  it  is  not  an  "all -pole" model .  (2)  The 

semicausal  representation  of  (33)  where  S=W  and  (e.  .}  is  defined  in  (43), 

is  a  special  autoregressive  moving  average  (ARMA)  model  of  a  two  dimensional 
random  field.  (3)  The  main  equation,  08"),  for  obtaining  the  spectral  estimate 
i s  1  inear. 


Examples :  Consider  the  autocorrelation  function 

r ( m , n )  =  sin2ir(m+n)/8  +  sin2ir(m+n)/8.05  +  7%  white  noise  (46) 

and  the  window  W0  =  [-2,2]x[0,2].  Thus \r(m,n)  are  available  over  W*  = 

[-4,4]x[-2 ,2].  Figure  11  shows  that  the  FFT  based  periodogram  is  unable  to 
resolve  the  two  closely  spaced  peaks  which  should  occur  at" [0.25,0.25]  and 
[0.2445,  0.2445]  in  the  first  quadrant.  However,  the  semicausal  model  spec¬ 
trum  (Fig.  12)  easily  resolves  these  peaks. 

Remarks 


1.  Semicausal  models  are  not  only  useful  in  high  resolution  spectral  estima¬ 
tion,  but  also  in  finding  random  field  realizations  of  known  spectra  (i.e., 
spectral  factorization).  For  example,  consider  the  irrational  covariance 

function  r(m,n)  =  exp{- .Ob/k^+l^) .  Table  1  compares  the  mismatch  of  the  model 
covariances  with  the  actual  covariances  on  a  5x5  window,  for  causal  and  semi¬ 
causal  realizations  of  equal  order.  Clearly,  the  semicausal  model  provides 
a  much  better  fit. 


206 


2.  Minimum  variance  semi  causal  models  were  first  introduced  by  Jain  in 
[16]  for  semirecursive  filtering  of  images.  Such  models  lead  to  algorithms 
which  are  recursive  in  one  of  the  (causal)  dimensions  and  are  nonrecursive 
in  the  other  (noncausal)  dimension.  Often,  the  nonrecursive  part  of  the 
algorithm  can  be  implemented  via  a  fast  unitary  transform  yielding  an  ef¬ 
ficient  overall  algorithm.  Such  models  also  arise  when  one  considers  finite 
difference  approximations  of  parabolic  partial  differential  equations  [17,18]. 
Applications  of  such  models  in  image  restoration  and  data  compression  have 
been  studied  in  [18-21]. 

3.  It  can  be  shown  that  any  minimum  variance  semicausal  model  of  finite 
order  (i.e.,  the  transfer  function  is  rational")  can  be  factorized  to  yield 

a  minimum  variance  causal  model  (i.e.,  an  AR  model).  In  general,  this  causal 
modiVwould  be  defined  on  a  non -symmetric  half  plane  (NSHP)  and  would  be  of  in¬ 
finite  order.  In  practice,  one  may  obtain  an  approximate  rational,  causal 
NSHP  realization  of  semicausal  model  (or  equivalently  of  its  SDF]- by  a  suit¬ 
able  truncation  of  this  factorization  (consistent  with  stability  requirements). 
The  foregoing  procedure  is  therefore  applicable  for  design  of  semicausal  and/ 
or  causal  digital  filters  whose  magnitude  of  the  frequency  response  is 
specified. 

4.  For  separable  spectra  (i.e.,  S^.Zg)  =  S-j  (z^  JSg^) ) ,  spectral  factori¬ 
zation  is  possible  and  therefore,  the  two  dimensional  causal,  semicausal  and 
noncausal  MVRs  as  well  as  the  ME  method,  all  yield  the  same  estimates. 

2.7  Noncausal  Models 


The  equations  for  noncausal  models  can  also  be  derived  by  specifying 
Wq  and  using  (38)  and  (39).  The  minimum  variance  condition  for  such  models 

requires  c.  .  to  be  a  moving  average  field  with  SDF 


S 


c 


1-  l  )_’  a 
m,neW 


m,nzl 


(47) 


where  W  is  a  noncausal  window  (e.g.,  WQ  =  [-p,p]x[-q ,q] ,  WQ  =  WU[0,0]). 

Using  (47)  in ( 36 ) >  it  is  seen  that  the  SDF  of  u  •  is  an  all  pole  model 

■  *J 

and  is  therefore  of  the  form  of  ME  spectrum.  However,  due  to  existence  dif¬ 
ficulties  explained  earlier,  an  admissible  covariance  matrix  R  would  not 
guarantee  a  non -negative  spectrum.  Hence  the  usefulness  of  these  models  in 
spectral  estimation  is  severely  limited. 


3.  CONCLUSIONS 

In  summary,  we  have  considered  extrapolation  and  spectral  estimation 
algorithmsfor  discrete  signals  in  one  and  two  dimensions.  For  bandlimited 
(oversampled)  signals  observed  over  a  region  of  finite  (and  small)  support, 
we  recommend  the  signal  be  extrapolated  first  followed  by  a  suitable  spectral 
estimator  (e.g.,  ME  or  smoothed  periodogram  etc.).  Several  existing  extra¬ 
polation  algorithms  were  shown  to  be  unified  under  the  minimum  norm  least 


207 


squares  criterion  of  extrapolation,  and  was  shown  to  yield  new  improved  ex¬ 
trapolation  algorithms  via  the  conjugate  gradient,  least  squares  and  recur¬ 
sive  methods.  For  other  two  dimensional  sequences,  the  minimum  variance  serni- 
causal  models  seem  to  yield  high  resolution  spectra  compared  to  other  methods. 
It  requires  solution  of  linear  equations  but  yields  an  ARMA  spectral  estimate. 
Not  described  in  this  paper,  are  the  implementation  and  other  obvious  practical 
considerations  such  as  i)  efficient  solution  of  the  block  Toeplitz  equations 
(38), (39)  ,ii)  use  of  two  dimensional  data  (rather  than  autocorrelations 
r(k,£)).  iii determination  of  the  order  of  the  model  etc.  These  and  other 
related  considerations  as  well  as  details  of  semi  causal  modeling  are  reported 
in  [22]. 


ACKNOWLEDGMENT 

The  author  is  grateful  to  S.  Ranganath  for  computer  implementing  several 
of  the  algorithms  reported  here  and  also  to  Phil  Jackson  of  the  Environmental 
Research  Institute  of  Michigan  for  verifying  the  semi  causal  model  spectral 
estimation  algorithm  and  Figs.  11  and  12. 


References 


1.  A.  Papoulis,  ''A  New  Algorithm  in  Spectral  Analysis  and  Band-limited  Ex¬ 
trapolation,"  IEEE  Trans.  Circuits  Sys.,  Vol .  CAS-22,  pp735-742,  Sept. 
1975. 

2.  M.S.  Sabri  and  W.  Steenaart,  "An  Approach  to  Band-limited  Signal  Extrap¬ 
olation:  The  Extrapolation  Matrix,"  IEEE  Trans.  Circuits  Sys.,  Vol.  CAS- 
25,  pp.  74-78,  Feb.  1978. 

3.  J.  A.  Cadzow,  "Improved  Spectral  Estimation  from  Incomplete  Sampled 

Data  Observations,"  Proc.  RADC  Spectrum  Estimation  Workshop,  pp.  109-123, 
May  1978.  '  '  ‘  " 

4.  R.  W.  Gerchbera,  "Super  Resolution  Through  Error  Energy  Reduction," 

Optica  Acta.  Vol. 21,  pp.  709,  1974. 

5.  D.  Slepian,  "Prolate  Spheroidal  Wave  Functions,  Fourier  Analysis  and 
Uncertainty  -  V:  The  Discrete  Case,"  The  Bell  Syst.  Tech.  J.,Vol.  57, 

No.  S,  pp.  1371-1430,  May-Uune,  1978. 

6.  A.  K.  Jain  and  S.  Ranganath,  Extrapolation  and  Spectral  Estimation 
Techniques  for  Discrete  Time  Signals,  Technical  Phase  Report,  RADC-TR- 
79-124,  Rome  Air  Development  Center,  Griffiss  Air  Force  Base,  N.Y. 

13441  ,  May  1979. 

7.  C.  W.  Helstrom,  "Image  Restoration  by  the  Method  of  Least  Squares, 

J.  Opt.  Soc.  Am.  Vol.  57,  pp.  297-303,  March  1967. 

8.  C.  L.  Rino,  Bandlimited  Image  Restoration  by  Linear  Mean  Square  Estima¬ 
tion,"  J.  Opt.  Soc.  Am.,  Vol.  59,  pp.  547-558,  May  1969. 


208 


9.  A.  K.  Jain,  "An  Operator  Factorization  Method  for  Restoration  of  Blurred 
Images,"  IEEE  Trans.  Computers,  Vol.  C- 26 ,  pp.  1061-1071,  Nov.  1977. 

10.  D.  Slepian,  et.al.,  "Prolate  Spheroidal  Wave  Functions,  Fourier  Analysis 
and  Uncertainty  Principle,"  Bell  Syst.  Tech.  J.,Vol.  40,  No.  1,  pp.  43- 
84,  1961. 

11.  A.  Papoulis  and  M.  S.  Bertram,  "Digital  Filtering  and  Prolate  Functions," 
IEEE  Trans.  Circuit  Theory,  Vol.  CT -19,  No.  6,  pp.  674-681,  Nov.  1972. 

12.  V.  R.  Algazi  and  M.  Suk,  "On  the  Frequency  Weighted  Least-Square  Design 
of  Finite  Duration  Filters,"  IEEE  Trans  Circuit  S.ys.,  Vol.  CAS-22,  No. 

12,  pp.  943-953,  1975.  '  .  " 

13.  A.  K.  Jain  and  S.  Ranganath,  "Two  Dimensional  Spectral  Estimation,"  Proc . 
RADC  Workshop  on  Spectral  Estimation,  Rome  N.Y.,  pp.  151-157,  May  1976. 

14.  W.  Rudin,  "The  Extension  Problem  for  Positive  Definite  Functions," 

Ill.  J.  Math.,  Vol.  7,  pp.  532-539,  1963. 

15.  B.  Dickinson, "Two-Dimensional  Markov  Spectrum  Estimates  Need  Not  Exist," 
(to  appear) 

16.  A.  K.  Jain,  "A  Semicausal  Model  for  Recursive  Filtering  of  Tw6  Dimen¬ 
sional  Images,"  IEEE  Trans.  Computers, Vo.  C-26,  pp.  345-350,  April  1977. 

17.  A.  K.  Jain,  "Partial  Differential  Equations  and  Finite  Difference  Methods 
in  Image  Processing,"  Part  I:  Image  Representation,"  J.  Optimiz,  Th.  Appl . 
Vol.  23,  pp.  817-834,  Sept.  1977. 

18.  A.  K.  Jain  and  J.  R.  Jain,  Part  II  (of  above):  Image  Restoration, 

IEEE  Trans.  Aut.  Contr. ,  Vol.  AC-23,  pp.  817-834,  Oct.  1978. 

19.  E.  Angel  and  A.  K.  Jain,  "Frame  to  Frame  Restoration  of  Diffusion  Images," 
IEEE  Trans.  Aut.  Contr.,  Vol.  AC-23,  pp.  850-855,  Oct.  1978. 

20.  A.  K.  Jain  and  S.  H.  Wang,  "Stochastic  Image  Models  and  Hybrid  Coding," 
Final  Report  N0SC  Contract, N0SC953-77-C-003  MJE,  Dept.  Elect.  Engr.  SUNY 
Buffalo,  New  York,  Oct.  1977. 

21.  S.  H.  Wang,  "Applications  of  Stochastic  Models  in  Image  Data  Compression," 
Ph.p.  Thesis,  Dept.  Elect.  Engr.  SUNY,  Buffalo,  1979. 

22.  A.  K.  Jain,  Final  Report,  AR0  grant  DAAG29-78G0206,  Signal  and  Image 
Processing  Laboratory,  Dept.  Elec.  Engr.,  U.C.  Davis,  CA  95616  (to  appear) 


209 


'-lcj.0 


60-00  100 


Figure  5:  Extrapolation  by 
Stabilized  Inverse 


100.00  -60.00  Q.~0 


Figure  6:  Extrapolation  by 

Conjugate  Gradient  Al¬ 
gorithm  (10  iterations) 


’-100.00  -SO. 00  0.00 


SO, 00  IOC 


00.00  -So. 00  0.00 


Figure  7:  Extrapolation  of  Noisy  Figure  8:  Extrapolation  of  Noisy 
Data  by  Conjugate  Gradient  Data  by  the  Mean  Square 

Algorithm  Extrapolation  Filter. 


Figure  10:  Causal  Model  Spectrum  Contours 


Table  1:  Comparison  between  Causal  and  Semicausal  Random  Field  Realizations 


J 

*  .905 

.894 

.868 

.208 

.315 

.389  i 

\  .039 

.041 

.047 

n 

.951 

.932 

.894 

n 

.111 

.231 

.315  n 

.019 

.025 

.036 

t 

1.00 

.951 

.905 

- ► 

+ 

0. 

.111 

.208  + 

- i. 

o. 

.017 

.031 

- k- 

-HU 

■Hu 

-Hn 

Actual  covariances 
r(-m,n)=r(m,-n)=r(- 

r(m,n) 

m,-n) 

Causal  Made!  Covariance 
Mismatch 

Semicausal  Model 
Covariance  Mismatch 

213 


ANTENNA  SPACIAL  PATTERN  VIEWPOINT  OF  MEM,  MLM, 
AND  ADAPTIVE  ARRAY  RESOLUTION 

WILLIAM  F.  GABRIEL 

Radar  Division 
Naval  Research  Lab 
Washington,  DC  20375 

Abstract 


The  Burg  maximum  entropy  method  (MEM)  and  the  maximum  likelihood  method 
(Mill)  nonlinear  spectral  estimation  techniques  are  compared  with  their  simi¬ 
lar  adaptive  array  antenna  counterparts.  The  comparison  permits  an  examina¬ 
tion  of  thair  principles  of  operation  from  the  antenna  array  special  pattern 
viewpoint,  and  qualifies  their  superresolution  performance  behavior.  Also, 
the  real-time  adaptive  resolution  of  two  incoherent  sources  located  within 
a  beamwidth  was  simulated,  and  results  are  presented  over  an  array  output 
SNR  range  of  0  to  40  DB. 


Introduction 


Adaptive  array  processing  techniques  are  being  investigated  to  determine 
their  applicability  to  high- resolution  location  of  sources/targets .  The  work 
was  motivated  by  high-resolutf'-.n  performance  reported  in  the  field  of  spec¬ 
tral  analysis  in  recent  years,  particularly  from  the  two  nonlinear  techniques 
generally  identified  as  the  maximum  ontropy  method  (MEM)  [1-3]  and  the  maxi¬ 
mum  likolihood  method  (MLM)  [4-b],  MEM  and  MIM  boar  a  very  close  relation¬ 
ship  to  nonlinear  adaptive  array  processing  techniques.  It  is  the  purpose 
of  this  paper  to  point  out  a  few  of  these  similarities,  examine  their 
principles  of  operation  from  the  "special  filter"  pattern  viewpoint  of 
adaptive  array  antennas,  and  to  discuss  some  of  the  limit* tions  to  be 
expected  in  their  superresolution  behavior, 

MEM  and  the  Adaptive  Sldelobe  Canceller 

Hie  Burg  MEM  has  been  shown  to  be  equivalent  to  least  mean  square  (LMu) 
error  linear  prediction  [7-9],  where  an  optimum  IC  point  prediction  filter 
predicts  the  nth  value  of  a  sequence  from  K  past  values. 

'  i  \  Vk  (1) 

k-1 


where 


A 

y.. 


is 


the  predicted  sample,  the  are  optimum  weighting  coefficients, 


and  the  K  past  samples  of  y^  ^  are  presumed  known.  Define  the  difference 
between  this  predicted  value  and  the  true  value  of  yn  as  the  error,  6n>  which 
is  to  be  LMS  minimized  eve a  u  larger  data  sequence  of  N  samples,  N  >  K. 


€ 


n 


(y  -  y  ) 

n  J  n 


(2) 


The  z- transforms  associated  with  this  discrete  convolution  may  be  written, 

K 


€0=) 


1 


-k 


k-1 


Y(z) 


(3) 


where  the  expression  within  the  brackets  may  be  defined  us  the  filter  trans¬ 
form  function,  ll(z),  consisting  of  a  polynomial  with  l<  roots  or  zero  factors, 
If  wu  optimize  the  weights  Ujt  in  such  a  manner  that  the  spectrum  of  <: 
approaches  white  noiau,  then  the  unknown  spectrum  of  the  input  is  approxi¬ 
mated  by, 

2 


Y  (ill) 


■  m 

u<u>) 


(CONSTANT) 


(4) 


K  n 

TT  (1 

k-1 


dk  u 


■  Jui 


2 


Conversion  of  the  abovu  linear  prediction  filter  to  a  weighted  linear 
urruy  of  special  sensors  is  straightforward  [10,  11],  with  the  simp lust  con¬ 
figuration  illustrated  in  Fig.  1.  Wu  assume  that  our  uunsor  ulomunts  are 
uquully  spaced,  and  that  narrowband  filtering  precedes  our  spuuiel  domain 
processing.  Thu  n^1  "snapshot"  signal  sample  at  thu  kL*1  element  will  consist 
of  independent  gausslun  receiver  noise,  1|jai,  plus  X  incoherent  source  volt- 
uguu , 


'kn 


T  <kui 


*i»> 


1  <  k  *  K 


(b) 


i-1 


where  u 

d  -  element  spueiug,  assumed  near  X/2 

X  -  wavelength 

0^  “  spueial  locution  angle  of  l^1  source 

«*  amplitude  of  i^1  source 

^in  *  random  phusu  of  i1"'1  source,  a1"'1  sample 


216 


!<* 


■jjl aataeac^a 


k  ■  element  indux 
n  »  snapshot  aamplc  index 

A  "snapshot"  la  defined  as  one  simultaneous  sampling  of  the  aperture  signals 
at  all  array  elements,  and  we  assume  that  N  snapshots  of  data  are  available. 

A  brief  examination  of  Fig.  1  from  the  standpoint  of  adaptive  arrays 
leads  to  the  conclusion  that  it  is  identical  in  configuration  to  a  special 
subclass  commonly  rofcrrud  to  in  the  literature  as  a  "sidolobo  canceller" 

[12,  13],  A  typical  sidolobe  canceller  configuration  from  Applubaum  [13] 
is  illustrated  in  Fig.  2.  For  the  benefit  of  those  who  may  not  be  familiar 
with  them,  it  should  be  noted  that  thu  unwuighted  muinbuum  "elumunt"  is 
usually  different  and  of  much  higher  guin  than  thu  others,  and  the  elements 
may  or  may  not  bu  equally  spaced.  They  are  designed  to  be  operated  on  the 
basis  of  many  successive  snapshots  (assuming  digital  operation)  becuusu  their 
environment  generally  involves  wouk  desired  signals  and  an  abundance  of 
iuturfuroncu  source  data.  They  are  u  prediction  filter  in  the  sense  that, 
after  convergence,  they  are  predicting  thu  uignul  at  thu  phase  cuutur  of  the 
muinbuum  ulumuut. 

Thu  pertinence  of  the  uduptiva  sidulobu  cuneullur  to  our  linuur  predic¬ 
tion  i'iltor  is  thut  their  special  filter  pattern  analysis  is  wull-duvulopud 
uml  can  be  applied  directly  to  uchluvu  a  butter  understanding  of  thu  super- 
resolution  performance  behavior.  A  further  point  is  that  real-time  operation 
is  reudily  achieved  vlu  most  of  the  current  adaptive  algorithms,  provided 
that  the  number  of  snapshots  is  sufficient  to  roach  convergence  in  whituning 
Convergence  muy  require  us  little  as  2  snapshots  or  as  many  us  suvoral 
thousand,  depending  upon  the  particular  algorithm  and  the  parameters  of  thu 
source  distribution. 


gpqolal  Filter  Patterns 


The  spatial  filter  function  for  thu  array  of  Fig,  1  is  simply  the 
uduptud  pattern  after  convergence,  which  is  commonly  referred  to  as  the 
steady- state  adapted  puttoru  and  may  reudily  be  computed  from  the  inverse 
of  the  sample  covariance  matrix  [13], 

Wo  *  IfM  V'  <<>) 

;,Vt  -  [0,0, 0,0, 0,0,0,  1]  (7) 

(0) 

(9) 


217 


where  is  the  n1-'1  "snapshot"  signal  sample  vector  whose  element  components 
are  given  by  equation  (5),  is  the  nth  snapshot  contribution  to  the  covar¬ 
iance  matrix,  M  is  the  sample  covariance  matrix  averaged  over  N  snapshots, 

Sv'  is  the  quiescent  weight  steering  vector,  p,  Is  a  scalar  quantity,  and 
is  the  optimum  weight  vector.  Note  that  the  steering  vector  IS  injects  a 
zero  wuight  on  every  element  except  for  the  end  element,  thus  causing  the 
quiescent  pattern  of  the  array  to  be  that  of  the  single  end  element.  Fig.  3 
illustrates  a  typical  quiescent  (single  element)  pattern  and  an  adapted 
pattern  obtained  from  an  8  element  linear  array  with  two  far-field,  incoher¬ 
ent,  30  dll  sources  loeuted  at  18  and  22  degreus.  The  adapted  puttern  weights 
were  computed  per  equation  (6)  from  the  inverse  of  the  covariance  matrix 
averaged  over  1024  simulated  snapshots.  Note  that  the  two  pattern  nulls 
(zeros)  align  perfectly  with  the  locations  of  the  two  sources.  Of  course, 
the  array  signals  in  this  simulation  were  corrupted  only  by  receiver  noise 
(no  element  errors  ure  included)  and  an  uverage  over  1024  snapshots  is 
Indued  steady-state.  Anothur  important  point  to  note  is  that  nulla  in  such 
an  adapted  pattern  may  be  located  arbitrarily  close  together  in  turmu  of 
beumwidth,  without  violating  any  physical  principle.  Yet,  because  the  nulls 
have  served  to  locate  two  sources  within  a  beumwidth,  one  may  describe  this 
as  a  "suporreuolul ion"  pattern. 

It  is  readily  shown  thut  this  adapted  puttern  is  obtuinud  by  subtracting 
the  summed  array  output  pattern  from  the  element  (muinbuum)  pattern  and, 
furthermore,  that  the  summed  array  pattern  consists  of  properly  wuighted 
"eigenvector  bourns"  [14],  Written  in  terms  of  the  eigenvector  weights,  we 
can  express  the  optimum  weights  in  the  form, 


where  c,  is  the  1^  1  eigenvector  of  the  covariance  matrix,  11^  is  the  itl 
eigenvalue,  and  Uo  is  the  smallest  eigenvalue  corresponding  to  receiver 
noise  power.  Note  that  only  the  significant  eigenvectors  corresponding  to 
Uj  >  11^  need  be  considered  here.  An  adaptive  array  forms  one  such  eigen¬ 
vector  beam  for  each  degree  of  freedom  consumed  in  nulling  out  the  special 
source  distribution.  Fig.  4  illustrates  the  two  eigenvector  bourns  required 
for  this  two-source  example.  It  should  be  emphasized  here  that  the  true 
resolution  and  signal  gain  of  the  array  is  reflected  in  these  eigenvector 
beams.  They  demonstrate  the  importance  of  having  as  wide  an  uperture  as 
possible,  because  the  super-resolution  capability  in  the  adapted  pattern  is 
a  percentage  of  the  true  resolution  of  these  beamo.  Also,  since  the  super- 
resolution  nulls  are  formed  via  the  subtraction  of  these  beams  of  conventional 
width,  it  follows  that  the  nulls  will  be  rather  delicate  and  very  sensitive 


218 


to  system  imperfections  ant!  signal  f loctuations . 

The  desired  "spacial  spectrum  pattern"  is  then  obtained  from  equation 
(4)  as  simply  the  inverse  of  the  adapted  pattern.  Fig.  5  illustrates  this 
inverse  for  the  two- source  example,  in  comparison  with  the  output  of  a 
conventional  beam  scanned  through  the  two  sources.  Several  comments  are 
in  order  concerning  such  inverse  patterns: 

a.  They  are  not  true  antenna  patterns,  because  there  iu  no  combination 
of  the  element  weights  that  could  produce  such  a  peaked  spacial  pattern. 

They  are  simply  a  function  computed  from  the  reciprocal  of  a  true  antenna 
pattern. 

b.  Linear  superposition  does  not  hold  in  either  the  inverse  or  the 
original  adapted  pattern,  because  of  the  nonlinear  processing  involved. 

c.  The  heights  of  the  peaks  do  not  correspond  with  the  relative 
strengths  of  the  sources,  because  the  depths  of  the  adapted  pattern  nulls 
do  not.  In  general,  the  adaptive  null  depth  will  be  proportional  to  the 
square  of  the  SNK  of  a  source  [143,  but  even  this  relationship  fails  when 
there  are  multiple  sources  closely  spaced. 

d.  There  is  no  real- signal  output  port  associated  with  such  a  pattern, 
because  it  is  not  u  true  antenna  pattern.  An  output  could  be  simulated,  of 
course,  by  implementing  the  equivalent  all-pole  filter  and  driving  it  with 
white  noise. 

e.  They  do  emphasize  the  locations  of  the  zeros  (nulls)  of  the 
adaptive  array  filter  polynomial, 

f.  They  are  inherently  capable  of  superresolution. 

g.  They  achieve  good  "contrast"  with  the  quiescent  pattern  background 
ripple  (equivalent  of  "sidelobes")  because  of  the  aforementioned  proportion¬ 
ality  to  the  square  of  source  strengths. 

h.  Spacial  information  is  gained  beyond  that  obtained  from  a  conven¬ 
tional  array  beam  which  is  scanned  through  the  sources,  because  the  array 
degrees  of  freedom  are  utilized  in  a  more  effective,  data  adaptive  manner. 

To  get  a  feel  for  real-time  operation  performance  with  realistic  weight 
update  averaging,  simulations  were  run  in  which  an  eight  element  array  had 
its  weights  computed  from  the  Howells- Applebaum  recursive  algorithm  [15], 
Weight  update  averaging  was  performed  via  a  dynamic  time  constant  in  accord¬ 
ance  with  the  reciprocal  of  the  closed- loop  bandwidth,  «, 


219 


(11) 


T  +  T  P 
o _ r 


where  t  “  quiescent  conditions  slow  time  constant 
o 

T  -  high-power  fast  time  constant 
P  ■  snapshot  power  ratio 

where  we  approach  the  value,  t  /2,  under  quiescent  conditions  when  P  ^  1, 
and  we  approach  the  value  of  T°when  P  »1,  This  formulation  permitsrus  to 
satisfy  the  10  percent  bandwidth  criterion  at  high  power  levels  to  avoid 
noisy  weights  [14]  by  choosing  the  value  of  T  ■  3.2,  and  yet  the  quiescent 
condition  time  constant  need  be  no  worse  than  tp  13  200.  The  larger  value 
for  t0  is  necessary  in  order  to  have  a  relatively  stable  quiescent  pattern. 

Fig.  7  illustrates  typical  snapshot  spectrum  plots,  after  convergence, 
for  our  two-source  case  at  two  different  SNR  levels.  Note  the  considerable 
fluctuations  which  occur  in  these  plots  near  the  peaks,  which  merely  reflects 
the  null  fluctuations  in  the  adapted  pattern.  'Hie  deteriorating  conditions 
exhibited  in  Fig.  7b  are  indicative  of  the  resolution  capability  nearing  its 
limit,  i.e.,  if  the  source  power  levels  are  reduced  further,  then  the  adap¬ 
tive  array  cannot  resolve  them  at  that  particular  Bpacing. 

A  summary  of  the  ^approximate  resolution  capability  limit  for  the 
adaptive  array  apaciul  filter  operating  against  two  incoherent  sources  is 
illustrated  in  Fig.  8.  This  performance  curve  is  universal  in  nature  because 
the  abscissa  is  source  separation  in  beamwidths,  and  the  ordinate  is  source 
SNR  measured  at  the  array  output,  i.e,,  element  SNR  multiplied  by  the  number 
of  elements  in  the  array.  ■Thus,  the  curve  can  be  utilized  for  any  number 
of  array  elements  in  a  linear  array  configuration.  Note  that  at  low  ordinate 
SNR  values,  we  actually  have  negative  SNR  at  the  elements.  The  curve  tells 
us  that  we  can  separate  two  sources  at  arbitrarily  small  spacings,  provided 
we  have  sufficient  SNR  and,  also,  provided  that  our  element  data  samples 
are  sufficiently  accurate.  Recall  that  the  simulations  involved  here  did 
not  include  any  element  errors. 

If  there  are  more  than  two  sources  within  a  beamwidth  or  if  coherence 
exists  among  the  sources,  then  difficulties  mount  rapidly  and  the  filter 
null  points  may  not  accurately  represent  the  spacin'!  locations  of  the 
sources . 


220 


MLM  and  Adaptive  Directional  Constraints 


The  maximum  likelihood  spectral  estimate  is  defined  as  a  filter  designed 
to  pass  the  power  in  a  narrow  band  about  the  signal  frequency  of  interest, 
and  to  minimize  or  reject  all  other  frequency  components  in  an  optimal 
manner  [4,  5].  This  is  identical  to  the  use  of  a  zero-order  mainbeam  direc¬ 
tional  gain  constraint  in  adaptive  arrays  [16,  17],  where  the  "spacial 
spectrum"  would  be  estimated  by  the  output  residual  power,  P  ,  from  the 
optimized  adapted  array  weights, 


*t 

P  -  W  MW 

o  — o  - o 

- 1  * 

where  W_  ■  p,M  _S  (optimized  weights) 


(12) 


M  -  covariance  matrix  estimate 

* 

_S  -  mainbeam  direction  steering  vector 

|x  m  scalar  quantity 

t 

Under  the  zero- order  gain  constraint,  we  require  S  W  =1,  whereupon  p,  be¬ 
comes 

t  -l  *  -  l 

U  -  (S  M  lS  )  (13) 


Substituting  p,  and  into  equation  (12)  then  results  in, 


P 


o 


(14) 


Upon  sweeping  the  steering  vector,  _S  ,  for  a  given  covariance  matrix  inverse, 
PQ  will  estimate  the  spacial  spectrum.  Interestingly,  this  result  is  ident¬ 
ical  (within  a  constant)  to  the  spectrum  obtained  from  the  inverse  of  the 
output  residual  power  from  an  unconstrained  optimized  adapted  array,  and 
the  principle  of  operation  is  the  output  from  a  continuously  adapting  pattern 
formed  by  subtracting  eigenvector  beams  from  the  quiescent  uniform  illumina¬ 
tion  steering  vector  "mainbeam"  as  it  scans. 


Fig.  6  illustrates  the  output  spectrum  plotted  from  P  for  the  two- 
source  case  utilized  for  Figs.  3,4,  and  5.  Note  that  in  comparison  with 
Fig.  5,  this  MLM  spectrum  has  peaks  which  are  about  18  dB  lower  and  thus  of 
less  resolution  capability.  However,  the  two  peaks  have  located  the  sources 
correctly  and,  in  addition,  the  peak  values  reflect  the  true  power  levels  of 
the  sources.  This  is  in  agreement  with  the  observations  of  Lacoss  [5]  and 
others.  Although  this  technique  has  less  resolution  than  the  previous  one 
and  requires  more  computation  in  plotting  the  output  spectrum,  it  does 


221 


offer  several  rather  significant  advantages: 

a.  The  output  power  is  directly  referenced  to  receiver  noise  power, 
thus  permitting  calibration  and  measurement  of  relative  source  strength. 

b.  If  the  sources  can  be  resolved,  then  a  psuedo- linear- superposition 
holds  at  the  peaks,  and  they  should  reflect  the  true  relative  source 
strengths. 

c.  The  output  of  this  filter  is  a  real  signal,  and  if  the  filter  pass- 
band  is  steered  to  a  particular  source,  one  can  monitor  that  source  at  full 
array  gain  while  rejecting  all  other  sources. 

d.  The  residual  background  spacial  ripple  (the  equivalent  of  pattern 
"sidelobes")  is  very  low  and  well  behaved. 

e.  It  is  not  necessary  to  have  the  elements  equally  spaced.  Thus,  one 
should  take  advantage  of  this  property  to  spread  them  out  for  a  wider  aper¬ 
ture  and  substantially  increase  the  resolution  for  a  given  number  of  elements. 
This  is  done  in  the  field  of  Geophysics  [4].  By  so  doing,  it  is  very  likely 
that  this  method  could  equal  the  resolution  of  the  previous  technique. 

Re  ferences 


1.  D.  G.  Childers,  1978,  "Modern  Spectrum  Analysis".  IEEE  Press  (Note: 

This  book  contains  complete  copies  of  the  following  references  Nos.  2, 

3,  4,  5,  6,  7,  8,  10). 

2.  J.  P.  Burg,  1967,  "Maximum  Entropy  Spectral  Analysis".  Proc,  of  the  37th 

Meeting  of  the  Society  of  Exploration  Geophysicists. 

3.  J.  P.  Burg,  1968,  "A  New  Analysis  Technique  for  Time  Series  Data", 
presented  at  the  NATO  Advanced  Study  Institute  on  Signal  Processing 
with  Emphasis  on  Underwater  Acoustics,  Enschede,  Netherlands. 

4.  J.  Capon,  August  1969,  "High- Resolution  Frequency- Wavenumber  Spectrum 
Analysis . "  Proc.  IEEE,  Vol.  57,  pp,  1408-1418, 

5.  R.  T.  Lacoss,  August  1971,  "Data  Adaptive  Spectral  Analysis  Methods". 
Geophysics,  Vol.  36,  pp.  661-675. 

6.  J.  P.  Burg,  April  1972,  "The  Relationship  Between  Maximum  Entropy  Spectra 
and  Maximum  Likelihood  Spectra".  Geo  hysics,  Vol.  37,  pp.  375-376. 

7.  A.  Van  den  Bos,  July  1971,  "Alternative  Interpretation  of  Maximum  Entropy 
Spectral  Analysis".  IEEE  Trans,  on  Information  Theory,  IT- 17,  pp.  493- 
494. 


222 


8.  L.  J.  Griffiths,  April  1975,  "Rapid  Measurement  of  Digital  Instantaneous 
Frequency".  IEEE  Trans,  ou  Acoustics,  Speech,  and  Signal  Processing, 

Vol.  23,  pp.  207-222. 

9.  D.  R.  Morgan  and  S.  E.  Craig,  December  1976,  "Real-Time  Adaptive  Linear 
Prediction  Using  the  Least  Mean  Square  Gradient  Algorithm".  IEEE  Trans, 
on  Acoustics,  Speech,  and  Signal  Processing,  Vol.  24,  pp.  494-507. 

10.  R.  N.  McDonough,  December  1974,  "Maximum- Entropy  Spatial  Processing  of 
Array  Data".  Geophysics,  Vol.  39,  pp.  843-851. 

11.  W.  R.  King,  March  1979,  ''Maximum  Entropy  Spectral  Analysis  in  the 
Spatial  Domain".  NRL  Report  8298. 

12.  P.  W.  Howells,  September  1976,  "Explorations  in  Fixed  and  Adaptive 
Resolution  at  GE  and  SURC".  IEEE  Trans,  on  Antennas  and  Propagation, 

Vol.  24,  pp.  575-584. 

13.  S.  P.  Applebaum,  "Adaptive  Arrays".  IEEE  Trans,  on  Antennas  and 
Propagation.  Vol.  24,  pp.  585-598  ,  September  1976. 

14.  W.  F.  Gabriel,  February  1976,  "Adaptive  Arrays  -  An  Introduction". 

Proc.  of  IEEE,  Vol.  64,  pp.  239-272. 

15.  I.  S.  Reed,  J.  0.  Mallett,  and  L.  E.  Brennan,  November  1974,  "Rapid 
Convergence  Rate  in  Adaptive  Arrays".  IEEE  Trans,  on  Aerospace  and 
Electronic  Systems,  Vol.  10,  pp,  853-863. 

16.  0.  L.  Frost,  August  1972,  "An  Algorithm  for  Linearly  Constrained 
Adaptive  Array  Processing",  Proc.  of  IEEE,  Vol,  60,  pp.  926-935. 

17.  S.  P.  Applebaum  and  D.  J.  Chapman,  September  1976,  "Adaptive  Arrays 
with  Mainbeam  Constraints".  IEEE  Trans,  on  Antennas  and  Propagation, 

Vol.  24,  pp.  650-662. 


223 


1 


1  <u_  It  n-k 

k=l 


FIGURE  1.  Array  Aperture  Linear  Prediction  Spacial  Filter  Model 


FIGURE'.  2.  Typical  Adaptive  Array  Sidelobe  Canceller  Configuration 


pi 


CONVENTIGNAI 

SCANNED  BEA 

y 

i  — V  / 
1 

-  /.  .. 

s  V 

( 

X 

/ 

( 

\ 

V 

i 

kvy\ 

1 

spacial  invIehse 


-90  -60  -30  0  4  A  30  60  90 

SPACIAL  ANGLE  IN  DEGREES 

FIGURE  5.  Spacial  Spectrum  Inverse  Pattern  for  the  Two- Source  Case 
of  Figure  3,  and  Comparison  with  Output  of  Conventional 
Scanned  Beam 


CONVENTIONAL 


“TV 

i  / 

— ♦ — h- 


-  MLH  SPEC 


/1\/ 


!  /  \ 

V  \ 


A  /  I" 

/  V  ; 


\  r  ■ 


SPACIAL  ANGLE  IN  DEGREES 


FIGURE  6.  MEM  Spacial  Spectrum  Plotted  from  Residual  Power  of  Adaptive 

Zero-Order  Mainbeam  Constraint  for  the  Two-Source  Case  of  Fig.  3 


FIGURE  8.  Universal  Approximate  Resolution  Limit  for  Two  Incoherent 

Sources,  Simulation  Conditions:  Narrowband,  No  Array  Errors 
X/2  Element  Spacing,  Linear  Array,  Gaussian  Receiver  Noise 


228 


h 


:•  ar  turn 


APERTURE  SAMPLING  PROCESSING  FOR 
GROUND  REFLECTION  ELEVATION  MULTIPATH  CHARACTERIZATION  * 


JAMES  E.  EVANS  und  DAVID  F.  SUN 


M.I.T.  Lincoln  Laboratory 
Lexington,  Muss.  02173 


Abstruct 


The  ungulur  resolution  und  trucking  of  elosuly  upuuud  turgetb  iu  a  clus- 
sicul  radar  problem  which  iu  receiving  lnercusud  attention,  and  terrain  multi¬ 
path  (e.g.,  reflections)  huu  long  been  recognized  to  be  u  principal  limitation 
on  Lhe  uchiuvuble  accuracy  of  radar  elevation  trackers  at  low  elevation  an¬ 
gles.  This  paper  diucuuuen  the  use  of  aperture  sampling  processing  to  improve 
the  angular  resolution/ trucking  and  to  character izu  the  multipath  environment. 
The  received  signal  is  muasured  along  the  antenna  aperture  and  the  modern 
"high  resolution"  apectrul  estimation  techniques  (e.g.,  maximum  likelihood  and 
maximum  entropy  method)  are  applied  to  the  sputiul  sample  datu.  Experimental 
resultu  of  uppiying  thuse  tecluilques  to  field  data  from  an  L-band  ulevation 
array  are  presented.  It  iu  shown  that  maximum  untropy  processing  offers  im¬ 
proved  performance  iu  resolving  multiputh  features  und  low  unglu  trucking. 

1. _ Introduction 


This  paper  presents  the  results  of  an  experimental  program  to  obtain  u 
better  quantitative  understanding  of  low  angle  microwave  propagation  phenomena 
and  to  assess  the  potential  for  improved  elevation  tracking  performance  by  ap¬ 
erture  sampling  processing.  It  has  long  been  recognized  that  terrain  multi- 
path  (e.g.,  reflections  und/or  shadowing)  are  a  principal  limitation  on  the 
achievable  accuracy  of  radar  eluvution  trackers  at  low  angles  (1,2],  Figure 
1-la  illustrates  the  propagation  phenomena  of  interest.  Since  elevation 
trucker  antennas  generally  huve  quite  directional  patterns  iu  the  elevation 
plane,  a  critical  factor  in  refining  und  predicting  the  performance  of  an  ele¬ 
vation  tracker  is  the  distribution  of  the'  received  signal  power  as  a  function 
of  elevation  angle  (i.u.,  the  so-called  angular  power  spectrum)  [1].  Figure 
1-lb  illustrates  the  angular  power  spectrum  that  might  arise  with  the,  multi- 
path  environment  shown  in  Figure  1-la.  With  our  approach,  we  treut  the  prob¬ 
lem  of  multipath  environment  characterization  and  target  elevation  angle  es¬ 
timation  as  one  of  estimating  the  angular  power  spectrum  of  the  received  sig¬ 
nal. 


This  work  was  sponsored  by  the  Federul  Aviation  Administration.  "The 
views  and  conclusions  contained  in  this  document  are  those  of  the  contractor 
and  should  not  be  interpreted  as  necessarily  representing  the  official  poli¬ 
cies,  either  expressed  or  implicit,  of  the  United  States  Government". 


229 


The  approach  taken  here  it)  baaed  on  the  adaptive  proceuuiug  oi  the  re¬ 
ceived  aperture  information.  Our  goula  are  (1)  to  obtJiin  higher  reuolution 
angular  power  opeetrutn  than  would  be  obtainable  with  the  "conventional"  buam 
aum  opectrum  l:or  better  ehuracterizatiou  of  the  multipath  environment,  and  (2) 
to  achieve  better  eatimation  of  the  turgut  elevation  angle  than  would  be 
achievable  with  the  "utundard"  tracking  muthodu  (e.g.,  monopulue)  by  utilising 
thiu  knowledge  of  the  multipath  eharueturiutieu.  The  otarting  point  in  our 
upproach  ia  meaauring  the  (complex)  received  waveform  (i.u.,  the  amplitude  and 
phuua)  at  varloua  pointa  along,  the  receiving  antenna  aperture.  Next,  we  apply 
uuvural  high  reuolution  apeetrul  analyuiu  tuehniquua  to  the  upatial  uumple 
data.  Here,  upeci.t'ieally ,  we  conaider  the  uue  of  the  maximum  likelihood  (ML) 
and  the  maximum  entropy  (Mli)  apeetrul  eatimation  method  a  which  have  been  ap¬ 
plied  in  time  uorieu  analyuiu  and  ueiumic/uonar  army  proeeuuing  [3-6]  for  re- 
uolving  the  clouuly  apaeud  apeetrul.  l.ineu, 

Although  thuru  iu  a  uimple  duality  buLwuun  apace  and  time  (nee  fig.  1-2) 
which  purmitu  one  to  apply  time  nuriuu  analyuiu  tuchniqueu,  novurul  fuutureu 
of  our  problem  differ  nlgnif ieautly  from  the  uuuul  time  neriuu  application: 

(1)  the  data  uamplua  are  complux  (thuu  alleviating  tlie  "puuk 
uplitting"  phenomena  encountered  with  real  data  uumpluu) 

(2)  the  number  of  datu  uamplen  iu  gunurally  umull  (u.g.,  b-30) 

and 

(3)  in  many  cauuu  thu  uignulu  recuived  from  different  diruet.ionu 
aru  highly  correlated  iu  time,  mioh  that  the  uumplu  uputial 
eovariuucu  ia  quite  nouutut Lonury .  The  e.ouuequuuee  of  thiu  iu 
that  the  ruuultu  for  certain  phuuu  ralatiouuhipu  eun  differ 

•  uubutuutiully  from  the  ruuultu  for  the  uuuuuible  covurlaucu. 

At  vurloua  poiutu  in  the  uubuuquuut  dlueuaulou,  wu  will  illpufrute  the  impact:, 
of  theuu  vuriouu  fuaturuu  on  the  overull  performance.  t, 

Thu  remainder  of  thu  pupur  iu  organized  au  iollown.  Section  11,  briefly 
<luac " ibeu  the  high  reuolution  algor itluuu  employed  in  our  work  uud  uhowu  uome 
exampieu  of  applying  them  to  nyuthotic  data.  TIiomo  algorithmic  aru  then  uuud 
in  analyzing  field  data  from  au  1,-buud  terrain  reflection  monuurumunt  progrum. 
Theuu  experimental  reoultu  are  preaentud  in  Suction  III,  followed  by  the  imm- 
mury  of  the  reaulto  in  the  luut  auction. 

II.  Illtill  RbSOhUTION  SPECTRUM  ESTIMATION  PROCEDURES 

There  hua  been  much  diucuuuion  of  the  maximum  likelihood  (MI.)  und  the 
maximum  entropy  (ME)  teehuiqueu  recently  in  the  geophyuieu  und  time  auriun 
unalysia  literature.  Thua,  we  will  only  pruaent  the  key  ideua  together  with 
pertinent  refereueea.  The  preaentntlon  of  the  various  teehniquea  can  be 


facilitated  by  use  of  vector  no tut ion.  The  notutional  convent ion  wo  will  use 
lu *  that  column  vectors  or  matrices  arc  represented  by  undorlinud  lower-caso 
letters.  The  asterisk  denotes  conjugate  transposition .  Undurlinod  uppercase 
.lutturs  reproaunt  Ulormetiun  matric.ua,  Thoa  vector  a.  roproaunta  tho  complex 
tionuor  output!)  of  tho  lino  array  while  tho  covariance  matrix  _k  huu  au  ita  i, 
j th  entry 

Ku '  “i*  “j 

Thu  ML  estimation  hud  ita  genesis  in  uelomie  array  bcumforming  under  con- 
ditiouu  of  directional  interference  1 3 ]  and  udaptivo  array  nulling  of  inoo 
heront  interfering  sources  [7],  Thu  problem  is  formula  tod  au  determining  the 
minimum  variance  unbiuuud  outimutu  of  the  power  from  a  given  angle  uubjeot  to 
the  interference  (complux)  covariance  matrix.  If  tho  interference  were 
Gaussian  with  u* known  eovuriauce  matrix  (u.g.,  via  meuuuromenta  in  the  ab- 
aence  of  the  duuirud  aiguul) ,  the  maximum  likelihood  uutimutu  of  the  power  in 
a  plane  wave  from  unglu  0  would  bo  giveu  by 

wheru  e  “  exp  (j2,,i  sin  U)  iu  the  rueeived  uignal  vuetor  eorruuponding  to  a 
unit  plane  wavu  froui  unglo  0  and  H  ■  _u  _a *  is  the  uumple  covuriuucu  matrix. 


Unfortunately,  when  the  interfering  signals  are  multipath  und/or  coherent 
jammeru,  q  cannot  bu  muauurud  independently  of  Lhu  dunirod  uignal.  Capon  [3] 
augguata  uuing  the  uumple  eovuriunee  matrix  k  uu  an  eutimute  of  (£,  uo  that  thu 
anglu  power  opuctrum  eutimute  iu  tliou  givun  by 

1*^(0)  “  (uY^)-1  (2) 


The  uutiuuitu  (2)  may  bu  eontruutud  to  the  utundard  "beam  uum"  (BS)  angle  power 
upon  train  eutimute  of 


iwu>  ■  iAi2 


,  y  -j2'iix,  uin  0 
-  a  u(xk)e  J  k 


*A 

eke 


(3) 


Thiu  particular  eutimute  giveu  rioo  tu  a  sin  KO/KQ  beam  pattern  which  has  a 
high  uidelobe  level  (-13  dB) .  By  weighting  the  data  uuntpleo,  lower  aidolobea 
are  obtained  at  the  "coat"  of  wider  buamwidlhu  (i.e.,  poorer  resolution)  [8], 


The  uuc  of  muximum  entropy  method  for  high  reaolution  spectrum  estimation 
liua  been  justified  by  a  variety  of  arguments  [4-6].  The  physically  most  mean¬ 
ingful  argument  for  radar  applications  is  tlint  the  received  angular  spectrum 
cun  be  represented  by  a  finite  number  of  poles  in  the  complex  plane,  i.e., 


231 


-2 


(4) 


P(0) 


,  j2'n5sin 
L  (e 

1-1 


where  lies  on  or  within  the  unit  circle  and  the  spatial  samples  are  taken 
at  points  x^  “  k6.  The  case  oi:  z ^  on  the  unit  circle  could  correspond  to  dis¬ 
crete  plane  waves  while  inside  the  unit  circle  might  correspond  to  an  ex¬ 
tended  target  (e.g.,  diffuse  reflections).  Time  samples  with  the  spectrum  of 
(4)  may  be  generated  by  passing  a  white  noise  process  through  an  all  pole  fil¬ 
ter  of  order  N,  which  is  a  standard  model  in  autoregression  time  series  analy¬ 
sis.  For  the  bulk  of  the  data  described  here,  only  a  short  nonstationary  set 
of  spatial  samples  wore  available.  Therefore,  the  Burg  algorithm  (modified 
for  complex  data  values)  lias  been  used  to  determine  the  values  of  [9]. 


Much  of  the  interest  in  the  ME  and  ML  algorithms  lias  been  generated  by 
experiments  in  applying  these  methods  to  synthetic  data.  Fig.  2-1  and  2-2 
show  examples  of  applying  the  various  algorithms  to  synthetic  data  consisting 
of  1  and  2  plane  waves,  respectively,  with  independent  noise  added  to  each 
sensor  sample.*  Thu  actual  angular  power  spectrum  in  each  case  consists  of 
impulse  functions  ut  the  plane  wave  angles.  In  the  case  of  a  single  plane 
wave,  ull  three  estimates  give  the  same  peak  location;  however,  the  high  reso¬ 
lution  techniques  more  closely  approximate  the  uctual  spectrum  shape. 

The  example  with  two  plane  wuvu  components  is  a  case  whore  the  components 
are  too  close  (0.84  standard  beumwldths)  to  resolve  by  classical  means.  In 
this  particular  cuse,  only  the  ME  method  gives  an  estimate  close  to  the  actual 
spectrum.  The  failuru  of  the  ML  technique  in  this  case  is  noteworthy  also  in 
view  of  its  success  in  resolving  plane  wuvu  signals  when  given  the  ensemble 
covariance  as  illustrated  in  Fig.  2-3.  Although  the  ME  technique  was  success¬ 
ful  in  the  case  shown  in  Fig.  2-2  at  smaller  separation  angles  (e.g.,  0.25 
standard  beaiuwidths)  and  unfavorable  phase  relationships  (e.g.,  0°  at  the  ar¬ 
ray  coutor) ,  it  too  is  unsuccessful  even  ut  high  signal  to  noise  ratio. 


This  significant  discrepancy  between  resolution  performance  for  the  en¬ 
semble  and  certain  sample  functions  provides  impetus  for  studies  of  alterna¬ 
tive  estimators.  In  this  context,  mention  should  be  made  of  theoretical 
bounds  on  the  performance  of  optimal  two  plane  wave  (sineusoid)  parameter  es¬ 
timators  which  suggest  that  significant  improvements  in  resolution  performance 
at  0Q  or  180°  phases  may  require  very  high  signal  to  noise  ratios  [10,  11], 

A  radar  tracker  can  be  viewed  as  attempting  to  determine  the  centroid  of 
the  angular  power  spectrum  peak  corresponding  to  the  direct  signal.  For  an 
elevation  trucker,  the  direct  signal  generally  corresponds  to  the  peak  with 


*ln  these  and  the  following  figures  of  this  type,  each 
spectrum  estimate  has  been  individually  normalized  to  yield 


type  of  angular 
0  dB  peak  value. 


232 


the  most  positive  elevation  angle.  Our  initial  effort  has  considered  such  a 
peak  finding  criteria  for  ME  spectra.  Conventional  radar  trackers  typically 
approximate  the  beam  sum  centroid  by  determining  the  null  of  the  ratio  £  (0)  = 
[A(Q)/£(0)]  where  the  difference  pattern  A(0)  ~  d£(O)/d0.  When  only  a  direct 
signal  is  present  at  angle  0^  and  0  is  within  1  beamwidth  of  0^, 

e(0)  ~  (N<S)  (0-0d)  (5) 

so  that  one  can  estimate  0  without  pointing  the  array  at  0^.  This  gives  rise 
to  an  "off  boresight"  elevation  tracker  whereby  0  is  constrained  to  be 
_>  0.7/N6  and  (5)  is  used  to  estimate  0.  if  the  last  estimate  of  0,  is  less 
than  0.7/N6.  This  keeps  the  main  lobes  of  2(0)  and  A(0)  pointed  above  the 
terrain  and  thus  significantly  reduces  the  errors  due  to  multipath  signals  at 
elevation  angles  below  0^ [ 1 ] • 

III.  Exp er: linen tal  Results 

As  mentioned  earlier,  the  terrain  multipath  is  a  principal  limitation  on 
the  achievable  accuracy  of  radar  elevation  trackers  at  low  elevation  angles. 
Low  angle  tracker  development  and  performance  prediction  has  been  inhibited  by 
the  lack  of  experimental  data  on  the  angular  distribution  of  the  scattered 
power  [1,2].  The  objective  of  the  work  reported  here  was  to  utilize  the  aper¬ 
ture  sampling  and  high  resolution  spectral  estimation  methods  discussed  ear-  ■ 
lier  to  analyze  the  field  measurement  data  of  the  L-band  ground  reflection 
signals  for  better  characterization  of  the  ground  reflection  elevation  multi- 
path  and  for  improved  estimation  of  the  target  elevation  angle. 

Figure  3-1  shows  the  aperture  sampling  equipment  utilized  in  the  field 
measurements.  The  sampled  aperture  consisted  of  a  5  element  6. 5 A  line  array 
(for  evaluation  of  a  small  aperture  tracker  performance)  as  well  as  a  9  ele¬ 
ment  26A  line  array  (for  fine  grain  resolution  of  various  multipath  compo¬ 
nents).  Sensors  in  both  array  configurations  were  uniformly  spaced.  The 
beamwidths  of  these  two  arrays  were  approximately  7°  and  1.75°,  respectively. 
The  received  signal  consisted  of  1090  MHz  replies  from  a  standard  air  traffic 
control  radar  beacon  (ATCRB)  on  board  an  aircraft  in  response  to  the  ground 
Interrogations.  The  amplitude  and  phase*  of  the  received  signal  at  each  of  11 
L-band  dipoles  (used  as  sensor)  was  digitized  and  recorded  on  magnetic  disks. 
Alsb^ recorded  were  the  digitized  elevation  angle  of  the  target  aircraft  ob- 
tairfed  from  a  tracking  theodolite  and  various  relevant  environmental  data. 

The  received  signal  consists  of  a  plane  wave  at  positive  elevation  angle 
(corresponding  to  the  direct  signal  coming  from  the  aircraft  ATCRB)  and  other 
plane  waves  generally  at  negative  elevation  angles  (corresponding  to  various 


The  RF  phase  was  measured  relative  to  a  reference  dipole  while  the  am¬ 
plitude  was  measured  on  calibrated  log  video  receivers. 


233 


w>' 


i 


* 

1 


ground  reflections  from  terrain  features).  Thus,  as  shown  in  Figure  1-1,  we 
expect  that  the  angular  power  spectrum  of  the  received  signal  (i.e.,  the  re¬ 
ceived  signal  power  as  a  function  of  elevation  angle)  will  consist  of  a  nar¬ 
row  peak  at  the  direct  signal  elevation  angle,  narrow  peaks  at  the  arrival  j 

angles  of  the  major  specular  ground  reflections  and  wider  peaks  in  regions  of  5 

diffuse  scattering  [1].  , 

In  the  results  presented  below,  the  maximum  entropy  (HE)  angular  power  ] 

spectrum  was  calculated  using  the  Burg  technique  [9,14],  and  the  filter  length 
of  the  corresponding  prediction  error  filter  was  determined  using  Akaike’s 
final  prediction  error  criterion  [15].  ] 

I 

Figure  3-2  shows  the  experimental  angular  power  spectral  estimates  for  a  ? 

special  measurement  at  MIT  Lincoln  Laboratory  antenna  test  range  where  the 
elevation  array  was  laid  sideways  horizontally  on  the  ground  so  as  to  have 

only  a  single  plane  wave  incident  on  the  array.  The  results  are  seen  to  cor-  , 

respond  closely  to  the  synthetic  data  result  of  Figure  2-1,  and  are  viewed  as  ' 

providing  a  degree  of  validation  for  our  data  recording  and  analysis  procedure. 

i 

Ground  reflection  field  measurements  were  made  for  various  terrain  condi¬ 
tions.  Results  for  both  near-flat  terrain  and  rolling  terrain  are  given  be¬ 
low.  For  comparison  purposes,  both  the  experimental  angular  power  spectral  1 

estimates  from  the  field  measured  data  and  the  corresponding  simulated  spec-  ; 

tral  estimates  (using  the  multipath  computer  simulation  program  developed  for 
the  Microwave  Landing  System  multipath  simulations  [13])  are  shown  in  the 
same  figure. 

Figure  3-3  shows  the  angular  power  spectral  estimates  for  a  flight  test 
in  which  the  target  helicopter  was  at  an  angle  of  1.4°  and  at  a  range  of  0.4 
nmi.  Figure  3-4  shows  the  terrain  height  profile  and  the  corresponding  ground 
model  used  to  generate  the  simulated  spectral  estimates.  The  terrain  in  front 
of  the  receiving  antenna  array  consisted  of  a  fairly  flat  grass  field  adjacent 
to  the  main  runway  at  lianscom  airport,  Mass.  Thus,  it  is  expected  that  the 
ground  reflected  signal  would  be  primarily  a  specular  reflection  from  the 
fairly  flat  ground  which  had  been  attenuated  by  the  grass  cover.  In  both  . 
measured  and  simulated  results,  it  can  be  seen  that  all  three  angular  power 
spectral  estimates  suggest  the  presence  of  two  signals  (one  direct  signal  and 
one  ground  reflected  signal);  however,  the  ME  spectral  estimate  appears  to 
offer  higher  resolution  of  the  signals  ns  well  as  lower  background  spectral 
level.  It  has  been  shown  that  the  axea  under  an  ME  spectral  peak  provides  a 
good  estimate  of  the  component  power  [4].  Based  on  this  estimate  of  the  com¬ 
ponent  power,  the  estimated  specular  reflected  signal  power  relative  to  the 
direct  signal  in  Figure  3-3a  is  -3  dB  which  compares  reasonably  well  with  -3.5 
tiB  for  the  corresponding  simulated  result  in  Figure  3~3b.  Also,  the  estimated 
arrival  eagles  of  the  ground  reflected  signals  agrees  fairly  well  between  the 
field  measurement  and  the  corresponding  simulation  results. 


Figure  3-5  shows  the  spectral  estimates  for  a  flight  test  in  which  the 
target  helicopter  was  at  4.2°  and  0.6  nmi.  This  field  test  was  taken  at  the 
golf  course  of  Fort  Devens,  Mass.  Figure  3-6  shows  the  terrain  height  pro¬ 
file  and  the  corresponding  ground  model  used  to  produce  the  simulated  spectral 
estimates.  Here,  the  terrain  in  front  of  the  receiving  antenna  array  has  var¬ 
ious  downward  and  upward  slopes  within  a  roughly  level  horizon  and  the  ground 
was  covered  very  much  uniformly  by  short  grass.  This  type  of  rolling  terrain 
can  often  give  rise  to  the  "focusing"  terrain  reflections,  i.e.,  more  than  one 
specular  reflection  presenting  at  a  given  time.  We  see  in  Figure  3-5  that 
both  the  field  measured  result  and  the  simulation  result  indicates  the  exis¬ 
tence  of  two  ground  reflected  signals,  one  at  -6.0°  and  the  other  at  -1.7° 
with  the  latter  having  lower  multipath  level.  Again,  the  ME  spectral  estimate 
appears  to  yield  better  resolution  of  various  arriving  signals  and  to  give 
lower  background  spectral  level. 

Figure  3-7  shows  experimental  results  for  target  elevation  angle  estima¬ 
tion  of  a  flight  test  at  golf  course  of  Fort  Devens,  Mass.  The  flight  path  of 
the  target  helicopter  was  vertical  descent  at  a  range  of  0 .'§  nmi  covering  ele¬ 
vation  angles  from  7.5°  to  1.5°.  We  see  that  the  elevation  angle  estimator 
based  on  the  ME  spectral  estimates  generally  yields  smaller  angular  errors 
than  the  conventional  monopulse,  especially  in  the  low  elevation  angle  region. 

IV.  Summary 

Our  preliminary  results  from  the  analysis  of  the  low  angle  terrain  scat¬ 
tering  field  measurements  by  utilizing  the  high  resolution  spectral  estimation 
techniques  suggest  that  these  modern  spectral  estimation  methods,  especially 
the  ME  method,  can  be  effectively  used  for  ground  reflection  elevation  multi- 
path  characterization  and  for  improved  target  elevation  angle  estimation. 
However,  several  problems  associated  with  applying  these  promising  techniques 
to  such  array  data  need  additional  study.  These  include  (1)  choice  of  the 
"correct"  prediction  error  filter  length  in  the  ME  method,  (2)  the  proper  es¬ 
timation  of  the  covariance  matrix  to  be  used  in  the  ML  method,  and  (3)  alter¬ 
native  estimators  which  are  less  sensitive  to  the  relative  phase  between  the 
various  received  signals. 


Acknowledgments 

R.  Sandholm  and  P.  Lanzillotti  designated  and  operated  the  equipment  used 
in  the  experimental  measurements.  J.  Reid  developed  the  bulk  of  software  and 
operated  on-site  computer  in  the  field  measurements.  K.  Roberts  typed  the 
manuscript  and  captioned  the  figures.  I.  Stiglitz  provided  encouragement  and 
assistance  in  commencing  and  carrying  out  the  studies  reported  here. 


235 


References 


1. 

2. 

3. 

4. 

5. 

6. 

7 . 

8. 

9. 

10. 

11. 

12. 

13. 

14. 

15. 

16. 


Barton,  D. ,  June  1974,  "Low-Angle  Radar  Tracking",  Proc.  of  IEEE,  p.  687. 

White,  W.D.,  November  1974,  "Low  Angle  Radar  Tracking  in  the  Presence 
of  Multipath",  IEEE  Trans,  on  AES,  Vol.  AES-10,  No.  6,  d.  835. 

Capon.  J.,  August  1969,  "High-Resolution  Frequency-wave  Number  Spectrum 
Analysis",  Proc.  of  IEEE,  Vol.  57,  p.  1408. 

Lacoss,  R.T.  ,  August  1971,  "Data  Adaptive  Spectral  Analysis  Methods, 
Geophysics,  Vol.  56,  No.  6,  p.  661.  ~  ~ 

Burg,  J.P.,  October  1967,  "Maximum  Entropy  Spectral  Analysis",  paper 
presented  at  37th  International  SEC  Meeting,  Oklahoma  City,  Oklahoma. 

Van  Den  Bos,  A.  1971,  "Alternative  Interpretation  of  Maximum  Entropy 
Spectral  Analysis",  IEEE  Trans,  on  Inform.  Thres.,  IT-17,  p.  693. 

Special  Issue  on  Adaptive  Antennas,  IEEE  Trans,  on  Antennas  and 
Propagation,  Vol.  AP-24,  No.  5,  September  1976. 

Harris,  F.,  January  1978,  "On  the  Use  of  Windows  for  Harmonic  Analysis 
With  the  Discrete  Fourier  Transform",  Proc.  of  IEEE,  Vol.  68,  No.  1, 
pp.  51-83. 

Anderson,  N.,  February  1974,  "On  the  Calculation  of  Filter  Coefficients 
for  Maximum  Entropy  Spectral  Analysis",  Geophysics,  Vol.  39,  pp.  66-72. 

Sklar,  J.  R. ,  and  Schweppe,  F.C.,  1964,  "The  Angular  Resolution  of 
Multiple  Targets",  M.I.T.  Lincoln  Laboratory,  Rpt,  1964-2. 

Pollon,  G.E.,  1967,  "On  the  Angular  Resolution  of  Multiple  Targets", 

IEEE  Trans.  Aerosp.  Electron.  System  (Corresp.),  Vol.  AES-3,  pp.  145- 
148. 


Peterson,  A.M. ,  et  al,  1976,  "Low  Angle  Radar  Tracking",  Standford 
Research  Institute  Report  JSR74-7. 

Capon,  J.,  April  1976,  "Multipath  Parameter  Computations  for  the  MLS 
Simulation  Computer  Program",  M.I.T.  Lincoln  Laboratory  Project  Report 
ATC-68 ,  FAA-RD-7 6-55 . 

Bernard,  T.E.,  "The  Maximum  Entropy  Spectrum  and  the  Burg  Technique", 
Advanced  Signal  Processing  Technical  Report  No.  1,  Texas  Instruments 
Inc.,  ALEX103-TR-7 5-01 . 

Akaike,  H. ,  1970,  "Statistical  Predictor  Identification",  Ann.  Inst. 
Statist.  Math. ,  Vol.  22,  p.  205. 

Cox,  H. ,  1973,  "Resolving  Power  and  Sensitivity  to  Mismatch  of  Optimum 
Array  Processors",  Jour.  Acoust.  Soc.  America,  Vol.  54,  pp.  771-785. 


736 


Figures 


FIGURE  1-la  Multipath  Propagation  Phenomena  of  Interest. 

SR  SPECULAR  REFLECTION 


RECEIVED  DR  DIFFUSE  REFLECTION 


FIGURE  1-lb  Relationship  of  Received  Power  to 
Low-Angle  Multipath  Environment. 


ANTENNA  APERTURE 
spttlil  staple:  s(*k)  ■  j:  t,  e'*2"  *1n0l  W* 

tlae  staple:  s(tk)  ■  £  i,  eJZ"  ft 

x.  •  <Hsnlte«w>nt  from  ^nurture 
*  phtse  carter  (n  wtveljngths 


FIGURE  1-2  Duality  Between  Angular  Estimation  with 
Line  Array  and  Frequency  Estimation  for  Time  Series. 


237 


MP  /a  (dB) 


238 


-3  0  5  -5  0  5 

Elevation  Angle  in  Deg 

a)  field  measured  results:  9  sen-  b)  simulation  results:  9  sensors 
sors  spaced  3.24A  apart  Hanscom  spaced  3.24X  apart 

field  measurement  10/21/77 


FIGURE  3-3  Angular  Spectrum  Estimates  for 
Field  Measurements  with  Terrain  Reflection:  Near-Flat  Terrain. 

TERRAIt'  HEIGHT  PROFILE  ALONG  2TO*  RADIAL  LINE 


FIGURE  3-4  Terrain  Height  Profile  at  Hanscom  AFB:  Near-Flat  Terrain. 


240 


ME  3-pole 
model 


theodolit 

angle 


u 

ME  3- 
pole  mode! 


-7.5  -5  -2.5  0  2.5  5  7.5  -7.5  -5  -2.5  0  2.5  5  7.5 

Elevation  Angle  in  Deg 

a)  field  measured  results:  9  sen-  b)  Simulation  results:  9  sensors 
sors  3. 24 A  apart  Fort  Devens  spaced  3.24A  apart, 

field  measurements  5/1/78 

FIGURE  3-5  Angular  Spectral  Estimates  for  Field  Measurements 
with  Terrain  Reflections:  Rolling  Terrain. 

ARRAY 


SURVEYED  TERRAIN 
HEIGHT  PROFILE 


:CC  200  300  400  500  600  700  800  900 

DISTANCE  AMAY  FROM  RECEIVING  ANTENNA  iff) 


GROUND  MODEL 
END  VIEW 


_  50- 

5  0- 

3 

i  -50- 


12  3  4  5  6  718  9  K>  11  BUM  13  i  #' 


GROUNO  MODEL: 
TOR  VIEW 


FIGURE  3-6  Terrain  Height  Profile  at  Fort  Devens 
Golf  Course:  Rolling  Terrain. 


M:  angle  estimator  based  on  the  HE  spectral  estimate 
0:  off-axis  monopulse  S:  standard  monopulse 


elevation 
antenna : 

5  sensors 
spaced 
1.62X  apart 


Aircraft  Elevation  Angle  in  Deg 


FIGURE  3-7  Angular  Error  in  Target  Elevation  Angle  Estimation 
for  the  Field  Measurements  at 'Fort  Devens,  Mass. 


MULTIPLE  EMITTER  LOCATION  AND  SIGNAL  PARAMETER  ESTIMATION 


RALPH  SCHMIDT 


ESL,  Incorporated 
495  Java  Drive 
Sunnyvale,  CA  94086 


Abstract 


Processing  the  signals  received  on  an  array  of  sensors  for 
the  location  of  the  emitter  is  of  great  enough  interest  to  have 
been  treated  under  many  special  case  assumptions. 

The  general  problem  considers  sensors  with  arbitrary  loca¬ 
tions  and  arbitrary  directional  characteristics  (gain/phase/ 
polarization)  in  a  noise/intcrference  environment  of  arbitrary 
covariance  matrix. 

This  report  is  concerned  first  with  the  multiple  emitter 
aspect  of  this  problem  and  second  with  the  generality  of  solu¬ 
tion.  A  description  is  given  of  the  Multiple  Signal  Classifica¬ 
tion  (MUSIC)  algorithm,  which  provides  asymptotically  unbiased 
estimates  of 

1.  number  of  incident  wavefronts  present 

2.  directions-of-arrival  (or  emitter  locations) 

3.  strengths  and  cross-correlations  among  the  incident 
waveforms 

4.  noise/interference  strength. 

Examples  and  comparisons  with  methods  based  on  Maximum  Likeli¬ 
hood  and  Maximum  Entropy,  as  well  as  conventional  beam  forming 
are  included.  An  example  of  its  use  as  a  multiple  frequency 
estimator  operating  on  time  series  is  included. 


243 


Introduction 


The  term  multiple  signal  classification  (MUSIC)  is  used  to 
describe  experimental  and  theoretical  techniques  involved  in 
determining  the  parameters  of  multiple  wavefronts  arriving  at  an 
antenna  array  from  measurements  made  on  the  signals  received  at 
the  array  elements. 

The  general  problem  considers  antennas  with  arbitrary  loca¬ 
tions  and  arbitrary  directional  characteristics  (gain/phase/ 
polarisation)  in  a  noise/interference  environment  of  arbitrary 
covariance  matrix.  The  Multiple  Signal  Classification  (MUSIC) 
approach  is  described;  it  can  be  implemented  as  an  algorithm  to 
provide  asymptotically  unbiased  estimates  of 

1.  Number  of  signals 

2.  Directions-of-arrival 

3.  Strengths  and  cross-correlations  among  the  directional 
waveforms 

4.  Polarizations 

5.  Strength  of  noiue/interference. 

These  techniques  are  very  general  and  of  wide  application. 
Special  oases  of  MUSIC  are 

1.  Conventional  Interferometry 

2.  Monopulse  DF,  i.e.,  using  multiple  colocated  antennas 

3.  Multiple  Frequency  Estimation. 

The  Data  Model 

The  waveforms  received  at  the  M  array  elements  are  linear 
combinations  of  the  D  incident  wavefronts  and  noise.  Thus,  the 
multiple  signal  classification  (MUSIC)  approach  begins  with  the 
following  model  for  characterizing  the  received  M  vector  X  as  in 


or 


X 


AF  +  W 


(1) 


The  incident  signals  are  represented  in  amplitude  and  phase 
at  some  arbitrary  reference  point  (for  instance  the  origin  of 
the  coordinate  system)  by  the  complex  quantities  F^ ,  F FQ. 

The  noise,  whether  "sensed"  along  with  the  signals  or  generated 
internal  to  the  instrumentation,  appear  as  the  complex  vector  W. 

The  elements  of  X  and  A  are  also  complex  in  general.  The 
a, .  are  known  functions  of  the  signal  arrival  angles  and  the 

,t.h 

array  element  locations.  That  is,  depends  on  the  1  array 

element,  its  position  relative  to  the  origin  of  the  coordinate 
system,  and  its  response  to  a  signal  incident  from  the  direc- 

j  ,  y.  j, 

tion  of  the  j  signal.  The  j  column  of  A  is  a  "mode"  vector 

th 

a(0j)  of  responses  to  the  direction-of-arrival  Oj  of  the  j 
signal.  Knowing  the  mode  vector  a(O^)  is  tantamount  to  know¬ 
ing  0^  (unless  a(O^)  =  with  °2'  an  unres°lvakle 

situation,  a  type  I  ambiguity). 


In  geometrical  language,  the  measured  X  vector  can  be 
visualized  as  a  vector  in  M  dimensional  space.  The  directional 


mode  vectors  a(0j)  =  u^  for  i  =  1,  2,  ...,  M,  i.  e.  ,  the  columns 

of  A,  can  also  be  so  visualized.  Equation  (1)  states  that  X  is 
a  particular  linear  combination  of  the  mode  vectors;  the  ele¬ 
ments,  of  F  are  the  coefficients  of  the  combination.  Note  that 
the  X  vector  is  confined  to  the  range  space  of  A.  That  is,  if 
A  has  2  columns,  the  range  space  is  no  more  than  a  2-dimensional 
subspace  within  the  M  space  and  X  necessarily  lies  in  the  sub¬ 
space.  Also  note  that  a(0),  the  continuum  of  all  possible  mode 
vectors  lies  within  the  M  space  but  is  quite  nonlinear.  For 
help  in  visualizing  this,  see  Figure  1.  For  example,  in  an 
azimuth-only  DF  system,  0  will  consist  of  a  single  parameter.  In 
an  azimuth/elevation/range  system,  0  will  be  replaced  by  Q,<|),r 
for  example.  In  any  case,  a(0)  is  a  vector  continuum  such  as  a 
"snake"  (azimuth  only)  or  a  "sheet"  (AZ/EL)  twisting  and  winding 
through  the  M  space.  (In  practice,  the  procedure  by  which  the 
a(0)  continuum  is  measured  or  otherwise  established  corresponds 
to  calibrating  the  array. ) 

In  these  geometrical  terms  (see  Figure  1),  the  problem  of 
solving  for  the  directions-of-arrival  of  multiple  incident  wave- 
fronts  consists  of  locating  the  intersections  of  the  a (6) 


continuum  with  the  range  space  of  A.  The  range  space  of  A  is,  of 
course,  obtained  from  the  measured  data.  The  means  of  obtaining 
the  range  space  and,  necessarily,  its  dimensionality  (the  number 
D  of  incident  signals)  follows. 

The  S  Matrix 


The  MxM  covariance  matrix  of  the  X  vector  is 

S  A  XX*  =  A  FF^A*  +  WW7 
or 

S  -  APA*  +  ASo  (2) 

under  the  basic  assumption  that  the  incident  signals  and  the 
noise  are  uncorrelated.  Note  that  the  incident  waveforms 
represented  by  the  elements  of  F  may  be  uncorre.lated  (the  DxD 
matrix  P  A  FF*  is  diagonal)  or  may  contain  completely  corre¬ 
lated  pairs  (P  is  singular).  In  general,  P  will  be  "merely" 
positive  definite  which  reflects  the  arbitrary  degrees  of  pair¬ 
wise  correlations  occurring  between  the  incident  waveforms. 

When  the  number  of  incident  wavefronts  D  is  less  than  the 
number  of  array  elements  M,  then  APA*  is  singular;  it  has  a 
rank  less  than  M.  Therefore 


| APA* |  *  |s  -  ASq 1  =  0  (3) 

This  equation  is  only  satisfied  with  A  equal  to  one  of  the 
eigenvalues  of  S  in  the  metric  of  S  .  But,  for  A  full  rank  and 
P  positive  definite,  APA*  must  be  n8nnegative  definite.  There¬ 
fore  A  can  only  be  the  minimum  eigenvalue  A  .  . 

min  Therefore,  any 

measured  S  =  XX*  matrix  can  be  written 


S  =  APA*  +  A 
where  A 


A  .  >0 

min  — 


min  o 

is  the  smallest  solution  to 


(4) 


•  ,, dniu j. j_ v. o l-  ov.uLi.uii  u>w  ]  S  —  AS  0.  Note  the 

mins  1  o ' 

special  case  wherein  the  elements  of  the  noise  vector  W  are 

2  2 
mean  zero,  variance  a  ;  m  which  case,  A  .  S  =  a  I. 

mm  o 


Calculating  a  Solution 


The  rank  of  APA*  is  D  and  can  be  determined  directly  from 
the  eigenvalues  of  S  in  the  metric  of  S  .  That  is,  in  the 
complete  set  of  eigenvalues  of  S  in  the°metric  of  So,  Am^n  will 

not  always  be  simple.  In  fact,  it  occurs  repeated  N  =  M-D  times. 


246 


This  is  true  because  the  eigenvalues  of  S  and  those  of 

S  -  X  .  S  =  APA*  differ  by  X  .  in  all  cases.  Since  the  mini- 
min  o  J  min 

mum  eigenvalue  of  APA*  is  zero  (being  singular) ,  X  ^  must  occur 

repeated  N  times.  Therefore,  the  number  of  incident  signals 
estimator  is 

A  A 

D  =  M  -  N  (5) 

A 

where  N  =  the  multiplicity  of  X  .  (S,S  )  and  X  .  (S,S  )  is  read 

c  ■*  mm  o  mm  o 

"X  .  of  S  in  the  metric  of  S  . "  (In  practice,  one  can  expect 
mm  o  '  f  v 

that  the  multiple  X  .  '  s  will  occur  in  a  cluster  rather  than  all 

^  min 

precisely  equal.  The  "spread"  on  this  cluster  decreases  as  more 
data  is  processed.) 


The  Signal  and  Noise  Subspaces 


The  M  eigenvectors  of  S  in  the  metric  of  SQ  must  satisfy 

Se.  =  X.S  e.,  i  =  1,  2,  . . . ,  M.  Since  S  =  APA*  +  X  .  S  .  we 
l  l  o  l  mm  o 

have  APA*e .  =  (X.  -X  .  )S  e..  Clearly,  for  each  of  the  X. 

l  l  mm  o  i  J  l 

that  is  equal  to  X  .  -  there  are  N  -  we  must  have  APA*e.  =  0 

mm  l 

or  A*ei  =  0.  That  is,  the  eigenvectors  associated  with 
Xmin(S,SQ)  are  orthogonal  to  the  space  spanned  by  the  columns 
of  A;  the  incident  signal  mode  vectors! 


Thus  we  may  justifiably  refer  to  the  N  dimensional  subspace 
spanned  by  the  N  noise  eigenvectors  as  the  noise  subspace  and 
the  D  dimensional  subspace  spanned  by  the  incident  signal  mode 
vectors  as  the  signal  subspace;  they  are  disjoint. 


The  Algorithm 


We  now  have  the  means  to  solve  for  the  incident  signal  mode 

vectors.  If  E„  is  defined  to  be  the  MxN  matrix  whose  columns 
N 

are  the  N  noise  eigenvectors,  and  the  ordinary  Euclidean  dis¬ 
tance  (squared)  from  a  vector  Y  to  the  signal  subspace  is 
2  2 

d  =  Y*EnE*Y,  we  can  plot  1/d  for  points  along  the  a(0)  con¬ 
tinuum  as  a  function  of  0.  That  is. 


(However,  the  a(0)  continuum  may  intersect  the  D  dimensional 
signal  subspace  more  than  D  times;  another  unresolvable  situation 
occurring  only  for  the  case  of  multiple  incident  signals  -  a 
type  II  ambiguity.)  It  is  clear  from  the  expression  that  MUSIC 
is  asymptotically  unbiased  even  for  multiple  incident  wavefronts 
because  S  is  asymptotically  perfectly  measured  so  that  E.,  is 
also.  a(0)  does  not  depend  on  the  data. 

Once  the  directions-of-arrival  of  the  D  incident  signals 

have  been  found,  the  A  matrix  becomes  available  and  may  be  used 

to  compute  the  parameters  of  the  incident  signals.  The  solution 

for  the  P  matrix  is  direct  and  can  be  expressed  in  terms  of 

(S  -  X  .  S  )  and  A.  That  is,  since  APA*  =  S  -  X  .  S  , 
min  o  mm  o 


P  =  (A*A)_1A*(S  -  X  ,  S  )A(A*A)-1  (7) 

min  o 

Including  Polarization 

Consider  a  signal  arriving  from  a  specific  direction  0  . 
Assume  that  the  array  is  not  diverse  in  polarization;  i.e.,  all 
elements  are  identically  polarized,  say,  vertically.  Certainly 
the  DF  system  will  be  most  sensitive  to  vertically  polarized 
energy,  completely  insensitive  to  horizontal  and  partially  sen¬ 
sitive  to  arbitrarily  polarized  energy.  The  array  is  only  sensi¬ 
tive  to  the  vertically  polarized  component  of  the  arriving 
energy . 

For  a  general  or  polarizationally  diverse  array,  the  mode 
vector  corresponding  to  the  direction  0o  depends  on  the  signal 

polarization.  A  vertically  polarized  signal  will  induce  one 
mode  vector  and  horizontal  another,  and  right  hand  circular  (RHC) 
still  another. 

Recall  that  signal  polarization  can  be  completely  charac¬ 
terized  by  a  single  complex  number  q.  We  can  "observe"  how  the 
mode  vector  changes  as  the  polarization  parameter  q  for  the 
emitter  changes  at  the  specific  direction  0  .  It  can  be  proven 
that  as  q  changes  through  all  possible  polarizations,  the  mode 
vector  sweeps  out  a  two-dimensional  "polarization  subspace." 

Thus,  only  two  independent  mode  vectors  spanning  the  polariza¬ 
tion  subspace  for  the  direction  0  are  needed  to  represent  any 
emitter  polarization  q  at  directi8n  0  .  The  practical  embodiment 
of  this  is  that  only  the  mode  vectorsof  two  emitter  polariza¬ 
tions  need  be  calculated  or  kept  in  store  for  direction  0  in 


order  to  solve  for  emitter  polarizations  where  only  one  was 
needed  to  solve  for  DOA  in  a  system  with  an  array  that  was  not 
polarizationally  diverse. 

These  arguments  lead  to  an  equation  similar  to  Equation  (6) 
for  P(0)  but  including  the  effects  of  polarization  diversity 
among  the  array  elements. 


PMU ( 0 ) 


K<en  f 

-  -  -  E  E*  i 
a*  (0  )  XN  ‘ 

Ly  J  L 


a  (0) 
x 


a  (6) 

y 


where  a  (0)  and  a  (0)  are  the  two  continua  corresponding  to,  for 
x  y 

example,  separately  taken  x  and  y  linear  incident  wavefront 
polarizations.  The  eigenvector  corresponding  to  in  Equa¬ 

tion  (8)  provides  the  polarization  parameter  q  since  it  is  of 
the  form  [1  q]T. 

The  Algorithm 


In  summary,  the  steps  of  the  algorithm  are 


Step  0 
Step  1 
Step  2 
Step  3 
Step  4 
Step  5 


Collect  data,  form  S 

Calculate  Eigenstructure  of  S  in  metric  of  S 
Decide  number  of  signals  D;  Equation  (5) 
Evaluate  PMU(0)  vs .  0;  Equation  (6)  or  (8) 
Pick  D  peaks  of  P_„,  (0) 

Calculate  remaining  parameters;  Equation  (7). 


The  above  steps  have  been  implemented  in  several  forms  to  verify 
and  evaluate  the  principles  and  basic  performance.  Field  tests 
have  been  conducted  using  actual  receivers,  arrays,  and  multiple 
transmitters.  The  results  of  these  tests  have  demonstrated  the 
potential  of  this  approach  for  handling  multiple  signals  in 
practical  situations.  Performance  results  are  being  prepared 
for  presentation  in  another  paper. 

Comparison  With  Other  Methods 

In  comparing  MUSIC  with  ordinary  beamforming  (BF)  Maximum 
Likelihood  (ML)  and  Maximum  Entropy  (ME) ,  the  following  expres¬ 
sions  were  used.  See  Figures  3  and  4. 


249 


:  zssaz. 


3 m 


IE-122 


2K2 


BF<6) 

=  a*  (0)  S  a  (0  ) 

ML<°> 

1 

a*  (0)S~1a(0) 

P  =  _ i _ 

where  c  is  a  column  of  S  \  The  beamformer  expression  calcu¬ 
lates  for  plotting  the  power  one  would  measure  at  the  output  of 
a  beamformer  (summing  the  array  element  signals  after  inserting 
delays  appropriate  to  steer  or  look  in  a  specific  direction)  as 
a  function  of  the  direction. 


Pml ( 9 )  calculates  the  log  likelihood  function  under  the 

assumptions  that  X  is  a  mean  zero,  multivariate  Gaussian  and 
that  there  is  only  a  single  incident  wavefront  present.  For 
multiple  incident  wavefronts,  PML(0)  becomes 


PML(0) 


which  implies  a  D  dimensional  search  (and  plot!) 

?ME  (^ )  base<^  on  selecting  1  of  the  M  array  elements  as 

a  "reference"  and  attempting  to  find  weights  to  be  applied  to 
the  remaining  M-l  received  signals  to  permit  their  sum  with  a 
MMSE  fit  to  the  reference.  Since  there  are  M  possible  refer¬ 
ences,  there  are  M  generally  different  PME(0)'s  obtained  from 

the  M  possible  column  selections  from  S  ^ .  In  the  comparison 
plots,  a  particular  reference  was  consistently  selected. 

An  example  of  the  completely  genera]  MUSIC  algorithm 
applied  to  a  problem  of  steering  a  multiple  feed  parabolic  dish 

s  i.  n  x 

antenna  is  shown  in  Figure  5.  — - -  pencil  beamshapes  skewed 

X 

slightly  off  boresight  are  assumed  for  the  element  patterns. 

Since  the  six  antennas  are  essentially  colocated,  the  DF 
capacity  arises  out  of  the  antenna  beam  pattern  diversity. 

The  computer  was  used  to  simulate  the  "noisy"  S  matrix  that  would 
arise  in  practice  for  the  conditions  desired  and  then  to  subject 


250 


it  tc  the  MUSIC  algorithm.  Figure  5  shows  how  three  direc¬ 
tional  signals  are  distinguished  and  their  polarizations  esti¬ 
mated  even  though  two  of  the  arriving  signals  are  highly  similar 
(90%  correlated) . 

The  application  of  MUSIC  to  the  estimation  of  the  fre¬ 
quencies  of  multiple  sinusoids  (arbitrary  amplitudes  and  phases) 
for  a  very  limited  duration  data  sample  is  shown  in  Figure  6. 

The  figure  suggests  that,  even  though  there  was  no  actual  noise 
included,  the  rounding  of  the  data  samples  to  six  decimal 
digits  has  already  destroyed  a  significant  portion  of  the 
information  present  in  the  data  needed  to  resolve  the  several 
frequencies . 


Summary  and  Conclusions 

As  this  paper  was  being  prepared,  the  works  of  Gething[l] 
and  Davies [2]  were  discovered,  offering  a  part  of  the  solution 
discussed  here  but  in  terms  of  simultaneous  equations  and 
special  linear  relationships  without  recourse  to  eigenstructure . 
However,  the  geometric  significance  of  a  vector  space  setting 
and  the  interpretation  of  the  S  matrix  eigenstructure  was 
missed.  More  recent  work  by  Reddi[3]  is  also  along  the  lines 
of  the  work  presented  here  though  limited  to  uniform,  collinear 
arrays  of  omnidirectional  elements  and  also  without  clear  utili¬ 
zation  of  the  entire  noise  subspace.  Ziegenbein [4 ]  applied  the 
same  basic  concept  to  time  series  spectral  analysis  referring 
to  it  as  a  Karhunen-Loeve  Transform  though  treating  aspects  of 
it  as  "ad  hoc".  El-Behery  and  MacPhie[5]  and  Capon [6]  treat 
the  uniform  collinear  array  of  omnidirectional  elements  using 
the  Maximum  Likelihood  method.  Pisarenko [7]  also  treats  time 
series  and  addresses  only  the  case  of  a  full  complement  of 
sinusoids;  i.e.,  a  1  dimensional  noise  subspace. 

The  approach  presented  here  for  Multiple  Signal  Classifica¬ 
tion  (MUSIC)  is  very  general  and  of  wide  application.  The 
method  is  interpretable  in  terms  of  the  geometry  of  complex 
M  spaces  wherein  the  eigenstructure  of  the  measured  S  matrix 
plays  the  central  role.  MUSIC  provides  asymptotically  unbiased 
estimates  of  a  general  set  of  signal  parameters  approching  the 
Cramer-Rao  accuracy  bound.  MUSIC  models  the  data  as  the  sum  of 
point  source  emissions  and  noise  rather  than  the  convolution  of 
an  all  pole  transfer  function  driven  by  a  white  noise  (i.e., 
autoregressive  modeling.  Maximum  Entropy)  or  maximizing  a 
probability  under  the  assumption  that  the  X  vector  is  zero  mean, 
Gaussian  (Maximum  Lilcelihood  for  Gaussian  data)  .  In  geometric 


251 


terms  MUSIC  minimizes  the  distance  from  the  a(0)  continuum  to 
the  signal  subspace  whereas  Maximum  Likelihood  minimizes  a 
weighted  combination  all  component  distances. 

No  assumptions  have  been  made  about  array  geometry.  The 
array  elements  may  be  arranged  in  a  regular  or  irregular  pattern 
and  may  differ  or  be  identical  in  directional  characteristics 
(amplitude/phase)  provided  their  polarization  characteristics  are 
all  identical.  The  extension  to  include  general  polarization- 
ally  diverse  antenna  arrays  will  be  more  completely  described  in 
a  separate  paper. 


References 


1.  Gething,  P.J.D.,  Oct.  1971,  Analysis  of  Multicomponent 
Wavef ields ,  Proc.  IEE,  Vol.  118,  no  10. 

2.  Davies,  D.E.N.,  March  1967,  Independent  Angular  Steering 

of  Each  Zero  of  the  Directional  Pattern  for  a  Linear  Array, 
IEEE  Trans,  on  Antennas  and  Propagation. 

3.  Reddi,  S.S.,  Jan.  1979,  Multiple  Source  Location  -  A 
Digital  Approach,  IEEE  Trans,  on  Aerospace  and  Electronic 
Systems,  Vol.  AES-15,  no.  1. 

4.  Ziegenbein,  J.,  April  2-4,  1979,  Spectral  Analysis  Using 
the  Karhunen-Loeve  Transform,  1979  IEEE  Int'l.  Conf.  on 
ASSP,  Washington,  D.C.,  P.  182  -  185. 

5.  El-Behery,  I.N.,  MacPhie,  R.H.,  July  1977,  Maximum  Likeli¬ 
hood  Estimation  of  Source  Parameters  from  Time-Sampled 
Outputs  of  a  Linear  Array,  J.  Aconst.  Soc .  Am.,  Vol.  62, 
no .  1 . 

6.  Capon,  J.,  Aug.  1969,  High  Resolution  Frequency-Wavenumber 
Spectrum  Analysis,  Proc.  IEEE,  Vo].  57,  no.  8. 

Pisarenko,  V.F.,  1973,  The  Retrieval  of  Harmonics  from  a 
Covariance  Function,  Geophy.  J.R.  Astr.  Soc.,  no.  33, 

P.  374  -  356. 


7. 


THE  S'GWAL  SUBSPACE 
(DETERMINED  BY  THE  DATA) 


'  SIGNAL  SPACE 
EIGENVECTOR 


SIGNAL  SPACE 
EIGENVECTOR 


NOISE  SPACE 
EIGENVECTOR 


THE  a<6)  CONTINUUM  OF 
DIRF.CYIQN-OF- ARRIVAL 
VECTORS (DETERMINED 
BY  AHRAY  GEOMETRY 
ANO  CHARACTERISTICS) 


ARE  THE  EIGENVECTORS  OF  S  CORRESPONDING 
1  2  T  >  EIGENVALUES  ^  >  X2  >  >0 

•v  *2  SPAN  THE  SIGNAL  SUBSPACE 

a(0.|),  a(02)  ARE  THE  INCIDENT  SIGNAL  MODE  VECTORS 


FIGURE  1.  Geometric  Portrayal  for  Three-Antenna  Case 


.nUfdjApir-A'  ■*».,  *»>•  ,  MAJV<«ua-» 


-ji  V‘  ***•  w  J 


*  ..Av 


0  x  1 

D-VECTOR  OF 
DIRECTION-OF-AR  RIVAL 
(AZIMUTH  ONLY  CASE.) 


Oxl 

.D-VECTOR  OF 

POLARIZATION 

PARAMETERS 


A  2  pV 2  0 

p  "  Ps/2  1  0 

0  0  25 

DxD 

DxD  MATRIX  OF 
CROSS  AND  AUTO 
POWERS 


FIGURE  2.  Block  Diagram  for  Multiple  Signal  Classification 


SNR  -  24  dB  SNR  =  10  dB 


EQUILATERAL 

TRIANGLE 

ARRAY 

BT=  100  (TIME-BANDWIDTH) 


AOA 


dB 


40 
30 
20 
10 
0 

-10 

-200  -150  -100  -  50 


MAX  ENTROPY 


SECOND 
PREDOMINANT 
PEAK  EXHIBITS 
AOA  BIAS 
ERROR 


aSa 


50  100  150  200 


AOA 


FIGURE  3.  Example  of  Azimuth-Only  DF  Performance 


255 


) 

I 

\ 

i 

I 

* 


) 

i 

} 


J 


i 


NO  BIAS  ERROR; 
PEAK  IS  VERY  SHARP 


i 


ir-/ 

t  . 


4 

L 

! 

f 

t 


FIGURE  4. 


Example  of  Azimuth  - 
Expanded  About  Weaker 


Only  DF  Performance 
Signal  at  30°) 


(Scale 


256 


J&, 


257 


THE  MAXIMUM  ENTROPY  SPECTRAL  ESTIMATOR 
USED  AS  A  RADAR  DOPPLER  PROCESSOR+ 


SIMON  HAYKIN  and  KING  C.  CHAN 


Communications  Research  Laboratory 
Faculty  of  Engineering 
McMaster  University 
Hamilton,  Ontario,  Canada 


Abstract 


The  paper  describes  the  use  of  the  maximum  entropy  method  to  estimate  the 
Doppler  shift  of  a  moving  target  in  the  presence  of  additive  white  noise  or 
colored  noise  consisting  of  clutter  plus  white  noise.  Computer  simulation 
results  are  included,  which  show  that  in  a  background  of  additive  white  noise 
a  Doppler  processor  based  on  the  maximum  entropy  method  is  only  slightly  sub- 
optimal  with  respect  to  a  conventional  Doppler  processor  based  on  the  discrete 
Fourier  transform,  whereas  in  the  presence  of  additive  clutter  with  a  narrow 
spectral  width  (e.g.,  ground  clutter)  it  is  markedly  superior  in  performance 
to  the  conventional  processor  for  the  case  of  low  Doppler  targets. 

+This  research  was  supported  by  the  Department  of  Communications,  Ottawa, 
Canada. 

1 .  Introduction 

In  this  paper  we  describe  a  novel  application  of  the  maximum  entropy 
spectral  estimator  (MESE)  as  a  radar  Doppler  processor  in  which  the  Doppler 
frequency  shift  produced  by  a  moving  target  (e.g.,  an  aircraft)  as  it  moves 
radially  with  respect  to  the  radar  antenna,  is  used  to  detect  the  presence  of 
the  target  in  a  background  of  stationary  clutter  and/or  receiver  noise.  Two 
features  make  the  MESE  well-suited  for  this  application:  (1)  its  high  frequency 
resolution  capability,  and  (2)  the  fact  that  it  can  operate  with  a  relatively 
small  number  of  data  samples,  which  is  often  the  case  in  a  pulsed  radar  envir¬ 
onment. 

In  Section  2  we  briefly  review  the  relevant  features  of  the  MESE,  and  in 
Section  3  describe  the  use  of  this  device  for  Doppler  processing.  In  Section 
4  we  compare  the  performance  of  this  processor  with  that  of  a  Doppler  pro¬ 
cessor  which  uses  the  combination  of  a  double-delay  line  canceler  and  discrete 
Fourier  transformer.  The  comparison  is  made  for  two  different  forms  of  inter¬ 
ference  at  the  receiver  input:  (1)  the  interference  consists  of  pure  white 


259 


Gaussian  noise,  and  (2)  the  interference  consists  of  white  noise  plus  a 
clutter  component. 


2.  The  Maximum- Entropy  Spectral  Estimator  (MESB) 


Consider  a  complex-valued  weakly  stationary  time  series  (x^,  where 
n  =  1,2,...,N.  The  algorithm  used  to  design  a  maximum-entropy  spectral 
estimator  (MESE)  for  such  a  time  series  involves  two  forms  of  prediction, 
namely,  forward  prediction  and  backward  prediction,  with  the  resulting  pre¬ 
diction-errors  denoted  by  e^iVl  and  ern,n  respectively,  where  m  =  0,1,2,...,M 
refers  to  the  pertinent  stage  of  computation.  These  two  prediction  errors 
may  be  computed  using  the  equivalent  lattice  model  of  Fig.  1,  where  the  num¬ 
ber  of  stages  in  the  lattice  model  is  denoted  by  M.  The  set  of  numbers  {pm), 
m  =  1,2,...,M,  are  called  the  reflection  coefficients  of  the  estimator. .  The 
input  time  series  {xnl  and  the  prediction-error  time  series  {e|jjn}  are 
orthogonal.  Furthermore,  the  successive  stages  of  the  equivalent  lattice 
model  are  decoupled  from  each  other,  that  is,  the  backward  prediction-errors 
in  the  modal  are  orthogonal  to  each  other.  Accordingly,  we  may  state  the 
following: 

(1)  The  reflection  coefficient  pm  at  stage  m  of  the  model  may  be  computed 

independently  of  the  reflection  coefficients  of  those  following  stage  m. 
In  fact,  the  global  minimization  of  the  prediction-error  power  with 
respect  to  p  may  be  achieved  as  a  sequence  of  local  minimization 
problems,  one  at  each  stage.  Specifically,  we  have 


N 

■2  I 

n»m+l 


(m-1)  (m-1) * 


'f  ,n 


'b, n-1 


m 


where  e 


(2) 


(0) 

f«n 


-  e 


(0) 

b,n 


N 

I 

n=mf  .1 


[  e 


(m-1) i 2 
f  ,n  1 


(m-1)  |  21 

,0-1'  J 


-  x 


m  =  1, 2, .  .  .  ,M  (1) 


(2) 


(m) 


To  compute  the  forward  prediction  error 
to  update  the  forward  prediction  error  el^-l) 


at  stage  m,  it  is  sufficient 

update  the  forward  prediction  error  e ^.m- 1 )  with  a  constant  (namely, 
the  reflection  coefficient  Pm)  times  the  delayed  backward  prediction- 
error  with  both  ^  and  referring  to  the  preceding 

stage  m-1.  The  orthogonality  of  the  elate  and  error  sequences  ensures 
that  this  new  forward  prediction-error  is  smallest  in  the  least- 

squares  sense  and  that  previous  coefficients  need  not  be  changed. 

Similar  arguments  hold  for  the  forward  prediction-error  at  stage  m. 

We  may  thus  write  t 


(m) 


(m-1) 


(m-1) 

er  =  e.  +  P  e,  . 

f,n  f,n  m  b,n-l 

(i)  (m-1)  *  (m-1) 

b ,  n  b ,  n- 1  m  f  ,  n 


(3) 


('D 


2.60 


f. 

I'  ’ 


where  m  =  1,2,...,M. 

(3)  The  minimum  prediction-error  power  P  at  stage  m  is  defined  in  terms  of 
the  minimum,  prediction-error  power  pm-l  at  stage  m-1  and  the  reflection 


coefficient  pm  at  stage  m  as  follows: 


where 


N 

I  x 
n=l  r 


m  =  1 , 2 , .  .  . ,  M 


A  filter  having  the  time  series  {x  }  as  input  and  the  prediction-error 
{e  }  as  output  is  known  as  the  predicSion-error  filter.  The  order  m  of  this 
filter  defines  the  number  of  stages  contained  in  th^  lattice  model  of  Fig.  1. 
The  coefficients  of  the  prediction-error  filter,  denoted  by  a^M^ ,  may  be 
computed  by  using  the  Levinson  recursion: 


(n-1) 


+  PM  aM-m 


(M-i)* 


m  =  0,1, 


where  the  asterisk  denotes  complex  conjugation.  Note  that 

(M)  _  ,1,  for  m  =  0 

am  *0,  for  m  >  M 


Finally,  having  evaluated  the  pertinent  set  of  filter  coefficients,  we  may 
compute  the  maximum  entropy  spectral  estimate  of  the  given  time  series  by 
using  the  formula 

P., 

s  (f)  - - - - ^ - 1 

x  M 

on  I  1  i  V  ~  (m)  - „  ,  \  t  ^ 


x  M 

2B  j  1  +  £  a  m  exp  (- jm2irfT  )  | 

m=l 

where  B  is  the  bandwidth  of  the  time  series,  and  Ts  is  the  sampling  interval. 
With  Nyquist  rate  sampling,  we  have  Ts  =  1/2B.  Equation  (9)  clearly  empha¬ 
sizes  the  nonlinear  nature  of  the  maximum  entropy  spectral  estimator  This 
formula  is  the  basis  of  the  Doppler  processor  to  be  described  next. 


3.  A  Doppler  Processor  Using  the  MESE 


A  radar  Doppler  processor  utilizes  the  effect  of  Doppler  shift  on  the 
echo  reflected  from  a  moving  target  in  that  the  power  spectrum  of  this  echo 
is  centered  about  a  frequency  which  is  shifted  from  the  transmitted  carrier 
frequency  by  an  amount  proportional  to  the  radial  velocity  of  the  target.  In 
addition  to  this  target  echo,  the  received  signal  contains  a  clutter  component 
(produced  by  reflections  from  unwanted  objects  such  as  ground  and  weather 
disturbances)  and  a  receiver  noise  component.  Figure  2  shows  the  block 


diagram  of  a  Doppler  processor  using  the  maximum  entropy  spectral  estimator, 
which  is  designed  to  compute  the  spectral  estimate  Sx(f)  of  the  sampled  form 
of  the  received  signal  at  a  pre-selected  set  of  frequencies,  uniformly  spaced 
across  the  Doppler  band  of  interest.  This  frequency  spacing  is  determined  by 
the  resolution  capability  of  the  MESE.  In  the  Doppler  processor  of  Fig.  2, 
the  logarithm  of  the  MESE  output,  rather  than  the  output  itself,  is  compared 
with  a  threshold  in  order  to  determine  whether  a  target  is  present  or  not. 

The  reason  for  this  operation  is  explained  below. 

In  order  to  properly  set  the  detection  threshold  level  at  the  Doppler 
processor  output,  we  need  to  know  the  statistical  behavior  of  Sx(f).  Owing 
to  the  nonlinear  dependence  of  S  (f)  on  the  receiver  input,  we  find  that  it  is 
rather  difficult  to  treat  this  problem  analytically. 

The  results  of  an  extensive  computer  simulation  study  [1]  have  shown  that 
with  white  Gaussian  noise  as  input,  the  statistics  of  the  logarithm  of  the 
spectral  estimate  Sx(f)<  that  is,  the  quantity  defined  by 

2*(f)  =  10  log1Q  Sx(f>  UO) 

may  be  closely  modeled  as  Gaussian.  This  is  illustrated  in  Fig.  3  where  we 
have  plotted  the  probability  of  false  alarm  P  ^  versus  the  threshold  level  VT. 
We  see  that  the  curve  calculated  by  assuming  a  Gaussian  model  for  Zx(f)  fits 
the  experimental  curve  (obtained  by  Monte-Carlo  simulation)  almost  perfectly. 
It  is  found  that  for  a  fixed  value  of  data  record  length  N,  the  mean  value  of 
2  (f)  decreases  with  the  order  M  of  the  prediction-error  filter.  On  the  other 
hand,  for  a  fixed  value  of  M,  the  me§n  value  of  £*(£)  increases  with  N.  With 
regard  to  the  standard  deviation  of  Zx(^)  it  increases  as  the  filter  order  M 
increases,  whereas  for  a  fixed  value  of  M,  it  decreases  as  the  record  length 

N  increases. 

With  clutter  as  input,  the  results  of  computer^simulation  show  that: 

(a)  If  the  spectrum  around  the  frequency  at  which^Z  (f)  is  computed  is 
symmetrical,  the  use  of  a  Gaussian  model  for  Z^tf)  is  justified. 

(b)  If  the  spectrum  of  the  input  around  the  frequency  of  interest  is 
unsymmetrical ,  the  statistics  of  Zg(f)  deviate  markedly  from  a  Gaussian 
one.  The  degree  of  deviation  from  a  Gaussian  model  increases  with  the 
slope  of  the  power  spectrum  of  the  clutter  input  at  the  frequency  of 
interest.  As  this  slope  approaches  zero,  (corresponding  to  white  noise 
input) ,  the  use  of  a  Gaussian  model  provides  an  increasingly  better  fit 
for  the  statistics  of  Z  (f) • 

(c)  Increasing  the  number  or  data  samples  N  and  the  filter  order  M  in  a  cor¬ 
responding  way  tend  to  reduce  the  deviation  from  Gaussian  statistics 

for  Z  (f) • 
x 

4.  Performance  of  the  Doppler  Processor 

For  the  MESE  to  be  useful  as  a  means  of  measuring  the  unknown  Doppler 
shift  of  a  moving  target,  a  threshold  level  must  be  set  at  the  processor 


v-nyy  ".y  <■ 


output  so  as  to  realize  prescribed  values  for  the  probability  of  false  alarm 
PpA  and  probability  of  detection  PQ.  In  the  case  of  thermal  noise  at  the 
receiver  input  as  the  only  source  of  interference,  we  simply  have  to  maintain 
the  mean  and  variance  estimates  for  each  Doppler  cell  of  interest  at  prescri¬ 
bed  values,  since  the  logarithm  of  the  Doppler  processor  output,  namely, 

( f )  is  Gaussian  at  all  frequencies  across  the  Doppler  band.  In  the  case  of 
a  received  signal  containing  a  clutter  component,  however,  the  situation  is 
more  complex,  because  the  statistics  of  ^(f)  in  the  cluttar-dominated  region 
tend  to  deviate  from  a  Gaussian  behavior  and  the  degree  of  deviation  is 
dependent  on  the  slope  of  the  spectrum  of  the  received  signal  at  the  frequency 
of  interest.  It  appears,  therefore,  that  in  this  case  some  form  of  correction 
in  the  threshold  settings  is  required  if  a  Gaussian  model  is  assumed  for  the 
statistics  of  Z  (f) .  This  is  usually  true  for  parametric  detectors  operating 
with  unknown  input  statistics. 


In  this  section,  the  performance  of  a  Doppler  processor  using  the  MESE 
is  investigated  and  the  results  compared  with  that  of  one  of  two  different 
configurations,  depending  on  the  input  conditions: 

(1)  A  Doppler  processor  using  the  discrete-Fourier  transformer  (DFT)  for  the 
case  of  white  Gaussian  noise  at  the  processor  input;  this  processor  is 
equivalent  to  a  matched  filter  for  non-fluctuating  targets  [2] ,  [3] . 

(2)  A  Doppler  processor  using  the  combination  of  DFT  and  double-delay  line 
canceler  [2],  [3]  for  the  case  of  clutter  plus  white  noise  at  the  input. 

For  tli is  study,  only  small  values  of  record  length  N  are  used,  so  that  the 
results  are  compatible  with  surveillance  radar  requirements.  Specifically, 
for  the  case  of  white  Gaussian  noise  at  the  input,  the  performance  of  these 
processors  are  evaluated  for  N  =  8  and  N  *  16,  whereas  for  the  case  of  white 
noise  plus  clutter,  the  evaluations  are  made  for  N  =  10  and  18.  The  two 
extra  samples,  in  the  latter  case,  are  needed  in  order  to  initialize  the 
double-delay  line  canceler  (i.e.,  for  transient  effects  to  die  out). 


4 . 1  Input  Containing  White  Gaussian  Noise 


Figure  4  shows  plots  of  the  probability  of  detection  PD  versus  the  input 
signal- to-noise  ratio  (SNR)  for  the  case  of  a  non-fluctuating  target  at  the 
normalized  Doppler  frequency  f  =  0.25  at  a  threshold  level  set  to  realize  a 
probability  of  false  alarm  PpA  =  10“6;  the  record  length  N  =  16.  Curve  A 
applies  to  the  DFT  processor,  whereas  curves  B,  C,  D,  and  E  apply  to  the  MESE 
processor  with  filter  order  M  =  1,2,3,  and  4  respectively.  These  results 
indicate  that  for  N  -  16,  a  MESE  processor  using  a  prediction-error  filter  of 
order  M  =  2  performs  the  best.  Such  a  processor  requires  only  1.3  dB  more 
SNR  than  the  optimal  16 -point  DFT  processor  for  Pq  =  0.9  and  PpA  =  10-^. 


For  the  case  where  N  =  8,  it  is  found  that  a  MESE  processor  with  M  =  1 
provides  the  best  performance,  and  for  this  value  of  M,  it  requires  1.7  dB 


more  SNR  than  the  8-point  DFT  processor  for  PD  =  0.9  and  PFA  -  10 


-6 


We  conclude  therefore  that  a  properly  designed  Doppler  processor  using 


263 


C1PV 


the  MESE  performs  only  slightly  sub-optimal ly  compared  to  one  using  the  DFT 
in  the  presence  of  additive  white  Gaussian  noise  at  the  input. 


4.2  Input  Containing  Noise  and  Clutter 

The  evaluations  in  this  case  are  made  by  using  three  sets  of  data  labelled 
as  CD-I,  CD- 2,  and  CD- 3  whose  spectra  are  shown  plotted  in  Fig.  5.  The 
conditions  at  the  input  are  described  Ly  specifying  two  parameters,  namely, 
the  signal- to-noise  ratio  (SNR)  and  clutter-to-noise  (CNR) .  For  each  target 
frequency  of  interest  across  the  Doppler  band  the  probability  of  detection 
is  computed  with  the  threshold  level  set  to  realize  a  probability  of  false 
alarm  PpA  =  10“^.  These  computations  are  carried  out  for  each  of  the  data 
sets  CD-I,  CD-2,  and  CD-3.  In  the  presence  of  clutter,  it  is  found  that  the 
order  M  of  the  prediction-error  filter  in  the  MESE  processor  has  to  be  equal 
to  or  greater  than  2  so  as  to  realize  an  acceptable  target  frequency  select¬ 
ivity. 

For  the  input  data  set  CD-I,  Fig.  6  shows  different  plots  of  the  required 
SNR  versus  the  normalized  Doppler  frequency  of  the  input  for  N  =  10, 

PFA  =  10-6,  and  PD  =  0.9,  assuming  a  non-fluctuating  target.,  Curve  A  refers 
to  a  Doppler  processor  using  the  combination  of  a  double- del  ay  line  canceler 
and  8-point  DFT.  Curves  B  and  C  refer  to  a  Doppler  processor  using  the  MESE 
with  M  =  2  and  3,  respectively.  We  see  that  in  this  case,  the  use  of  the 
MESE  outperforms  the  double  delay-line  canceler-DFT  combination  by  a  fairly 
large  margin.  For  example,  for  the  case  when  PD  =  0.9  AND  PpA  «  10-6,  we 
find  that,  in  the  Doppler  frequency  range  0.03  to  0.12,  a  MESE  processor  with 
filter  order  m  =  3  requires  3  to  10  db  less  SNR  than  the  combination  of  a 
double  delay-line  canceler  and  DFT.  However,  for  the  case  of  data  sets  CD-2 
and  CD- 3,  the  improvement  resulting  from  the  use  of  MESE  is  not  as  large,  as 
may  be  seen  by  examining  Figures  7  and  8  respectively. 


i 

L 


Based  on  these  results,  we  may  make  the  following  observations: 

1)  Compared  to  a  Doppler  processor  using  the  combination  of  a  double  delay¬ 
line  canceler  and  DFT,  a  processor  using  the  MESE  provides  a  substantial 
improvement  in  the  detection  of  a  slowly  moving  target  in  the  presence 
of  clutter  with  a  very  narrow  spectral  width  (e.g.,  ground  clutter). 
Therefore,  by  using  the  MESE,  the  frequency  band  of  target  visibility  is 
extended  by  a  sizeable  margin. 

2)  In  the  case  of  clutter  with  a  narrow  spectral  width,  the  prediction- 
error  filter  of  the  MESE  processor  should  be  as  high  as  possible.  In 
particular,  the  filter  order  M  should  be  chosen  so  as  to  minimize  the 
effect  of  clutter  on  neighboring  Doppler  components. 

5.  Conclusions 


It  has  been  shown,  by  means  of  computer  simulation,  that  in  the  case  of 
white  noise  as  input,  the  statistics  of  the  logarithm  of  the  maximum  entropy 
spectral  estimate  may  be  closely  modeled  as  Gaussian.  However,  in  the 


264 


presence  of  a  clutter  component,  the  statistics  of  the  logarithm  of  this 
estimate  deviate  from  a  Gaussian  model,  with  the  deviation  becoming  more 
pronounced  as  the  slope  of  the  input  clutter  spectrum  is  increased. 

A  processor  based  on  the  maximum  entropy  estimator  has  been  described 
for  the  measurement  of  the  Doppler  shift  of  a  moving  target.  It  has  been 
shown  that  in  the  presence  of  additive  white  noise  at  the  input,  this 
processor  is  only  slightly  sub-optimal  compared  to  a  Doppler  processor  based 
on  the  discrete  Fourier  transform,  which  is  optimum  for  the  case  of  a  non¬ 
fluctuating  target.  In  the  presence  of  a  clutter  component  with  narrow  spec¬ 
tral  width  (e.g.,  ground  clutter),  however,  we  find  that  a  Doppler  processor 
using  the  MESE  is  markedly  superior  to  the  combination  of  a  double  delay-line 
cancelor  and  discrete  Fourier  transformer,  for  low  Doppler  targets. 

It  should  be  emphasized  that,  although  MESE-based  processor  has  several 
useful  features  for  the  processing  of  radar  signals,  it  does  not  completely 
eliminate  the  need  for  other  conventional  signal  processing  methods  (e.g.,  the 
discrete  Fourier  transform) .  Rather,  a  MESE-based  processor  may  be  used  as  a 
way  of  extending  the  performance  capability  of  conventional  radar  processors. 
Also,  it  should  be  emphasized  that  there  is  need  for  further  investigations 
(both  theoretical  and  experimental)  concerning  the  statistical  behavior  of 
the  MESE  output  and  the  full  exploitation  of  the  MESE  for  radar  applications. 

6.  References 


(1)  K.C.  Chan  and  S.  Haykin,  March  1979,  "Applications  of  the  Maximum  Entropy 
Method  in  Radar  Signal  Processing",  Report  CRL-62,  Communications 
Research  Laboratory,  McMaster  University. 

(2)  R.J.  McCaulay,  February  22,  1972,  "A  Theory  for  Optimal  MTI  Digital 
Signal  Processing,  Part  I,  Receivor  Synthesis",  MIT  Lincoln  Laboratory, 
Technical  Note  1972-14. 

(3)  C.E.  Muehe  et  al.,  June  1974,  "New  Techniques  Applied  to  Air  Traffic 
Control  Radars",  Proc.  IEEE,  Vol.  62,  pp.  716  -  723. 


265 


Forward  filter  output 


Backward  filter  output 


■M-l,n  M,  n 


FIGURE  1.  Equivalent  lattice  structure  for  the  prediction- error 
filter. 


Target  1b  present 
at  the  Doppler 
frequency 

No  target  i» 
present 


FIGURE  2.  Block  diagram  of  a  Doppler  processor  using  the  MESE. 


i  iiiM:  " 


niii.  1 . .  im  najii'i")ni«n>niii 


MiHHH 


HbAillMl 


‘-j'fll'l . 


FIGURE  6.  Curves  of  signal- to-' oise  ratio  versus  Doppler  frequency 
for  the  conventional  and  MESE  processors,  based  on  data 
set  CD1.  Clutter- to-noise  ratio  =  22  db. 


269 


;  •■1,\  T  r 


K(i;uir^  Signal -to-fvjise  Ratio  Idbl 


FIGURE  7.  Curves  of  signal-to-noise  ratio  versus  Doppler  frequency 
for  the  conventional  and  MESE  processors,  based  on  data 
set  CD2.  Clutter-to-noise  ratio  =  22db. 


APPLICATIONS  FOR  MESA  AND  THE  PREDICTION  ERROR  FILTER 


WILLIAM  R.  KING 

King  Research 
10209  Westford  Dr. 
Vienna,  Virginia  22180 


Abstract 

In  recognition  of  the  whitening  and  resolution  character¬ 
istics  of  MESA  and  the  prediction  error  filter,  it  is  demonstra¬ 
ted  that  these  techniques  may  have  several  signal  processing 
applications.  Examples  are  provided  to  illustrate  stable,  high 
resolution  power  spectra,  high  resolution  autocorrelation  func¬ 
tions,  and  signal  detection  in  strong  interfering  clutter. 

Introduction 

In  1967  Burg  [l]  utilized  the  prediction  error  filter  and 
the  maximum  entropy  method  for  estimating  power  spectra  in  the 
frequency  domain.  Since  that  time  (12  years  ago) ,  the  maximum 
entropy  method  has  had  little  application  for  purposes  other 
than  estimation  of  power  spectra  in  the  frequency  and  wavenumber 
domains.  Maximum  entropy  spectral  analysis  (MESA),  which  is  a 
whitening  filter,  is  recognized  chiefly  for  its  high  resolution 
capability  when  applied  to  short  data  sets.  However,  the  white¬ 
ning  characceristic  of  MESA  may  prove  to  be  effective  for  reduc¬ 
ing  clutter  in  signal  detection  applications. 

In  a  power  spectral  analysis  MESA  is  often  used  in  lieu 
of  the  Fourier  transform.  There  are,  of  course,  many  applica¬ 
tions  for  the  Fourier  transform,  both  narrowband  and  broadband, 
which  may  be  considered  as  possible  applications  for  MESA  or  the 
prediction  error  filter.  In  a  demonstration  of  the  versatility 
of  MESA  and  the  prediction  error  filter,  these  techniques  .are 
used  for  estimating  power  spectra,  the  autocorrelation  function, 
and  for  detecting  signals  in  typical  radar  interference  clutter. 


The  MESA  Technique 

The  MESA  algorithm  incorporates  the  Burg  technique  in  all 
applications  demonstrated  in  this  paper.  The  MESA- Burg  equa¬ 
tions  for  complex  data  are  given  in  slide  1. 

The  MESA  power  spectra  is  expressed  by  eqn .  (1)  for  the 

space-wavenumber  domain,  where  k  is  the  wavenumber,  N  is  the 
number  of  filter  weights,  Ax  is  the  uniform  antenna  element 
spacing  and  y.  is  the  nth  filter  weight.  The  total  noise  power 
is  given  by  eqn.  (2),  and  the  filter  weights  are  evaluated 
with  use  of  the  iterative  eqn.  (4).  However,  the  last  filter 
weight  is  defined  by  eqn.  (3).  The  terms  of  eqn.  (3)  contain 
the  forward  and  backward  prediction  errors  denoted  by  a  and  3  that 
are  defined  by  eqns.  (5)  and  (6)  as  functions  of  the  data 
samples . 

Because  MESA  "snapshot"  spectral  patterns  are  inherently  *■ 
unstable  and  sometimes  contain  split  spectral  peaks,  all  MESA 
examples  are  computed  by  incorporating  averaging  techniques  into 
the  MESA  algorithm.  King  [2]  employed  several  averaging  meth¬ 
ods,  and  concluded  that  two  such  methods,  averaged  covariance 
matrix  elements  and  averaged  filter  weights,  are  both  useful 
methods  for  applying  MESA  to  successive  sets  of  data  samples. 

Antenna  Patterns 

Stable  wavenumber  power  spectra  (antenna  patterns)  are  shown 
computed  for  an  8  element  antenna  usirg  both  averaging  methods. 

In  the  first  example  a  signal  incident  at  5  degrees  from  broad¬ 
side  with  a  SNR  of  10  dB.  (per  antenna  element)  is  shown  detected 
in  slide  2  using  averaged  filter  weights.  And  in  slide  3  the 
same  signal  is  detected  by  MESA  using  averaged  covariance  matrix 
elements.  Side  peak  .levels  are  substantially  reduced  and  whit¬ 
ened  with  use  of  both  averaging  methods. 


Two  signals,  separated  5  and  6  degrees,  with  a  SNR  of 
10  dB  each  signal  (each  antenna  element)  are  resolved  using  both 
averaging  methods  for  the  maximum  number  of  filter  weights  per¬ 
mitted  (N=7) .  The  two  signals,  which  are  separated  5  degrees, 
are  observed  resolved  in  slide  4  using  filter  weights  averaged 
1.0  times.  And  in  slide  5,  two  signals  which  are  separated  C 
degrees  are  shown  resolved  with  use  of  covariance  matrix  aver¬ 
aged  10  times.  Both  averaging  methods  of l actively  staolize 
even  the  highest  order  MESA  patterns,  while  permitting  excellent 
resolution  and  clutter  suppression. 


ANGLE  IN  DE 
ONE  SIGNAL,  AVERAGED  COVARlAi 

SLIDE  3 


Autocorrelation 


The  autocorrelation  function  r(t) ,  as  noted  in  slide  6, 
may  be  defined  by  the  Fourier  transform  of  the  power  spectrum 
P(oj)  .  And  the  power  spectrum  P(oj)  may  be  derived  from  a  time 
dependent  function  x(t)  using  either  the  Fourier  transform  or 
MESA  as  indicated  in  the  second  equation  of  slide  6. 

However,  the  autocorrelation  function  may  also  be  eval¬ 
uated  using  the  prediction  error  transform  by  following  the  deri¬ 
vation  outlined  in  slide  6.  The  prediction  error  is  expressed 
as  a  linear  function  of  thfe  appropriately  sampled  power  spectra 
density  P  (to).  The  prediction  error  ea;  is  transformed  to  the 
time  domain  using  the  Fourier  transform  as  indicated  in  slide  6. 
By  transforming  the  prediction  error  to  the  time  domain,  the 
autocorrelation  function  r(t)  becomes  inversly  proportional  to 
the  prediction  error  transform  as  noted.  For  demonstration  pur¬ 
poses  the  autocorrelation  function  is  evaluated  with  both  the 
FFT  and  the  prediction  error  transform.  The  power  spectral  den¬ 
sity  P(o),  which  is  required  by  both  methods,  is  evaluated  only 
with  the  FFT. 

As  an  example,  a  multipath  type  signal  is  displayed  in 
slide  7.  The  signal  consists  of  two  gated  sinusoids,  one  delayed 
by  one  pulse  width  and  shifted  in  phase  by  180  degrees.  The 
transfer  function,  which  is  shown  in  the  middle  of  slide  7,  con¬ 
tains  the  anticipated  modulation.  In  all  computations  the  sig¬ 
nal  is  treated  as  a  complex  function,  although  only  the  real 
part  is  displayed  in  slide  7.  The  autocorrelation  function, 
shown  in  the  lower  section  of  slide  7,  is  evaluated  with  use  of 
the  FFT . 

In  slide  8  the  autocorrelation  function  is  computed  for 
reduced  bandwidths  in  order  to  illustrate  the  advantage  of  using 
the  prediction  error  transform.  The  bandwidth  is  reduced  by 
clipping  the  transfer  function  at  both  ends  of  the  spectrum  as 
desired.  The  upper  plot  of  slide  8  depicts  the  conventional 
autocorrelation  function  (using  the  FFT)  computed  at  only  10%  of 
the  original  bandwidth.  Peaks  of  the  autocorrelation  function 
are  not  resolved.  However,  in  the  middle  plot  the  autocorrela¬ 
tion  function  computed  using  the  prediction  error  transform  (and 
only  10%  bandwidth) ,  contains  the  usual  number  of  anticipated 
peaks.  Even  with  the  bandwidth  reduced  to  only  5%  of  the  total 
bandwidth,  all  3  peaks  of  the  autocorrelation  function  are  pre¬ 
sent  with  use  of  the  prediction  error  transform.  Distortion 
effects  have  become  noticable  at  the  5%  bandwidth  level  indicat¬ 
ing  the  limitations  of  the  prediction  error  transform. 


280 


i  -*‘FZZ 


r  ( t )  ~ 


icot 

e 


dco 


AUTOCORRELATION  FUNCTION 
AS  FOURIER  TRANSFORM 


p(»)= 


*N  i4t) 


v~  N  icon  (AO  I 

4  Yn  6  i 

n=0  i 


MESA  USING  PREDICTION 
ERROR  TRANSFORM 


r  ( t) 

►P(co) 

TRANSFORM  PAIRS 

A 

N 

P  = 

0) 

a  P 
n  co-n 

PREDICTED  POWER 

eco  ~ 

P  - 

CO 

A 

Pco 

PREDICTION  ERROR 

N 

Y^  P 
n  co-n 

eco 

z 

n«C 

PREDICTION  ERROR 

en  - 

r(t) 

Ng-itn  (Aco) 
n=0  n 

FOURIER  TRANSFORM  OF 
PREDICTION  ERROR 

r(t)  = 

en 

AUTOCORRELATION  FUNCTION 

t 

N  -itn(Aco) 

'  n 

AS  PREDICTION  ERROR 
TRANSFORM 

n=0 

DERIVATION  OF  AUTOCORRELATION  FUNCTION  USING  THE 
PREDICTION  ERROR  TRANSFORM 


SLIDE  6 


281 


SLIDE  7 


Signal  Detection 


The  possibility  of  detecting  signals  by  increasing  the 
SNR  with  MESA  was  observed  previously  in  the  MESA  antenna  pat¬ 
terns  shown  in  slides  2  and  3.  In  yet  another  demonstration  of 
the  whitening  capability  of  MESA,  radar  clutter  is  simulated  as 
shown  in  slide  9  where  random  phased  clutter  bands  have  power 
levels  typical  of  ground,  rain,  and  interference  clutter.  Such 
a  clutter  model  has  been  used  previously  by  Sawyers  [3]  in  his 
demonstration  of  adaptive  filtering.  A  signal  having  a  0  dB. 

SNR  is  located  between  the  clutter  at  the  frequency  ratio  of 
.375  as  denoted  by  the  arrow  in  slide  9.  The  signal  is  detec¬ 
ted  as  shown  in  slide  10,  by  applying  MESA  to  several  sets  of 
32  data  samples  and  using  24  filter  weights.  The  strong  clutter 
bands  are  very  effectively  whitened  by  MESA  such  that  the  larg¬ 
est  background  peak  in  slide  10  is  about  10  dB.  below  the  signal 
peak  level.  Similar  results  may  be  obtained  for  any  signal  lo¬ 
cation.  For  example  in  slide  11  a  signal,  located  at  the  center 
of  the  interference  clutter  (.65)  ,  is  equally  well  detected  a- 
gain  with  MESA  applied  to  consecutive  sets  of  32  data  samples 
using  26  filter  weights.  In  both  slides  10  and  11  the  MESA 
filter  weights  are  averaged  over  30  consecutive  sets  of  32  data 
samples.  While  considerable  averaging  is  used  to  achieve  the 
results  indicated  in  slides  10  and  11,  less  averaging  of  fewer 
filter  weights  may  also  achieve  satisfactory  signal  detection, 
but  with  less  resolution  capability. 

It  is  difficult  to  imagine  that  results  comparable  to 
those  shown  in  slides  10  and  11  could  be  achieved  with  any  con¬ 
ventional  Fourier  signal  detection  method. 

References 


1.  Burg,  John  P.,  "Maximum  Entropy  Spectral  Analysis",  presen¬ 
ted  at  the  37th  meeting  of  Society  Explor.  Geophys.,  Okla¬ 
homa  City,  Oklahoma  (Oct.  1967) 

2.  King,  W.  R.  ,  "Stable  MESA  Antenna  Patterns",  NRL  Memo  Rept. 
3998  (May  18,  1979) 

3.  Sawyers,  J.  H. ,  "Applying  the  Maximum  Entropy  Method  to 
Adaptive  Filtering",  Twelth  Asilomar  Conference  on  Circuits, 
Systems,  and  Computers,  page  198,  (Nov.  6-8,  1978)  IEEE 
cat.  no.  78CHI369-8  C/CAS/CS 


284 


••  •  * ;  *•  \  .<■: 


-  UliilMfli 


MESA  DETECTOR 


THE  MAXIMUM  ENTROPY  METHOD  APPLIED  TO  RADAR 
ADAPTIVE  DOPPLER  FILTERING 

J.  H.  SAWYERS 


Hughes  Aircraft  Company 
Ground  Systems  Division 
Fullerton,  California 


Abstract 

The  maximum  entropy  method  of  Burg  is  well  known  as  a  means  of  estimating 
high  resolution  power  spectra  from  short  time  series  data.  In  this  paper,  the  Burg 
method  is  employed  in  the  calculation  of  adaptive  doppler  filter  coefficients  for  a 
pulse  doppler  radar  operating  in  a  nonstationary  clutter  environment.  It  is  demon¬ 
strated  by  simulation  that  the  adaptive  doppler  filters  converge  rapidly  and  accurately 
in  severe  models  of  clutter  and  thermal  noise. 

Introduction 

High  speed  digital  hardware  technology  has  progressed  to  the  stage  of  development 
where  the  implementation  of  many  sophisticated  radar  signal  processing  techniques 
here-to-fore  impractically  difficult  has  become  feasible.  One  area  of  signal  process¬ 
ing  that  is  receiving  increased  consideration  is  adaptive  doppler  filtering  which  is 
the  subject  of  this  paper. 

Modern  pulse  doppler  radar  systems  are  required  to  operate  in  nonstationary 
clutter  environments  comprising  land,  weather,  sea,  chaff  and  other  interference 
all  of  which  can  limit  the  ability  of  the  radar  to  detect  and  track  targets  if  the  doppler 
filters  are  not  matched  to  the  clutter.  The  intent  of  this  paper  is  to  demonstrate  the 
performance  of  an  adaptive  doppler  filter  bank  (ADFB)  that  employs  the  maximum 
entropy  method  (MEM)  of  Burg  11]  in  deriving  the  filter  coefficients  in  a  nonstationary 
clutter  environment. 

The  maximum  entropy  method  of  power  spectrum  estimation  is  well  known  in 
the  field  of  geophysics,  and  in  recent  years  has  attracted  attention  in  other  scientific 
fields  including  radar.  The  acceptance  occurs  because  MEM  yields  higher  resolution 
power  spectrum  estimates  from  short  time  series  data  when  compared  to  conventional 
methods  of  power  spectrum  estimation  [2]  -  [4] .  Implicit  in  conventional  methods 
is  a  window,  either  weighted  or  unweighted,  that  treats  missing  data  as  zero  thereby 
causing  spectral  sidelobes  and  a  loss  of  resolution.  On  the  other  hand,  the  MEM 
spectrum  is  noncommittal  with  regard  to  missing  data  but  corresponds  to  the  most 


289 


random  time  series  whose  autocorrelation  function  agrees  with  the  known  or  derived 
values.  Since  adaptive  filtering  is  related  to  power  spectrum  estimation,  incorpo¬ 
rating  MEM  into  adaptive  filter  designs  is  appropriate. 

The  paper  begins  with  a  brief  description  of  adaptive  doppler  filtering  by  a 
pulse  doppler  radar.  The  next  section  outlines  how  the  MEM  is  used  to  derive  the 
adaptive  coefficients  for  the  finite-impulse-response  (FIR)  filters  comprising  the 
ADFB.  The  results  pf  a  Monte-Carlo  simulation  of  selected  adaptive  doppler  filters 
of  the  ADFB  are  giveh  in  the  last  section.  The  filters  are  subjected  to  severe  clutter 
including  land,  rain,  bandlimited  interference,  multiple  narrowband  point  sources 
and  thermal  (white)  noise.  It  is  demonstrated  that  MEM  adaptive  doppler  filtering 
is  characterized  by  rapid  convergence  and  overall  excellent  clutter  rejection 
properties. 


Adaptive  Doppler  Filtering 

The  pulse  doppler  radar  in  the  search  mode  transmits  a  pulse  train  at  the  pulse- 
repetition-frequency  (PRF)  of  1/T  pulses  per  second.  The  interpulse  spacing,  T,  is 
based  on  the  maximum  unambiguous  range  and  doppler  frequency  requirements  of 
the  particular  radar.  The  unambiguous  doppler  frequency  is  bounded  by  ±  1/2T. 

N  pulses  of  the  pulse  train  constitute  a  dwell,  and  after  each  dwell  the  radar  antenna 
is  directed  to  a  different  beam  position.  At  each  beam  position,  the  ADFB  processes 
the  N  doppler- shifted  pulses  received  from  each  range  resolution  cell  in  the  beam. 

The  block  diagram  of  the  ADFB  and  target  detection  logic  that  processes  these 
N  pulses  is  shown  in  Figure  1. 

In  Figure  1,  Xfc  is  the  complex  digitized  baseDand  signal  vector  of  the  N  received 
pulses  in  a  dwell  on  the  k-th  scan  of  a  particular  range  resolution  cell.?  The  wk(n)'s 
are  the  complex  coefficient  vectors  ot  individual  doppler  FIR  filters  in  the  ADFB 
and  are  calculated  by  means  of  an  adaptive  algorithm  using  the  MEM.  The  design 
criterion  for  the  individual  doppler  filters  is  to  provide  maximum  signal- to- clutter- 
plus-noise  ratio.  This  dictates  that  the  wk(n)'s  be  calculated  by  the  following  equation: 


A-1 


wk(°)  =  Rk  u(n) 


(1) 


A-1 

In  (1),  Rk  is  the  inverse  of  the  estimated  correlation  matrix  of  the  clutter  environ¬ 
ment  that  existed  on  scan  k,  and  ju(n)  is  the  steering  vector  that  specifies  the  doppler 
frequency  fn  at  which  the  signal- to-clutter-plus-noise  ratio  (SCNR)  is  to  be  maximized. 
_u(n)  is  given  by 


u(n) 


T 


-jTT(N-l)f^T  -jn(N-3)fnT 


jn(N-l)fnT 


(2) 


?The  matrix  notation  used  is  as  follows:  Lower  case  letters  are  scalars;  underlined 
lower  case  letters  are  vectors;  and  upper  case  letters  are  square  matrices.  The 
symbols  *,  T  and  t  denote  complex  conjugate,  transpose  and  complex  conjugate 
transpose,  respectively. 


$ 


290 


where  the  fn’s  are  uniformly  distributed  in  the  interval  jfn|  s  1/2T.  The  method 
of  calculating  is  given  in  the  next  section. 

In  the  detection  process,  the  magnitudes  of  the  outputs  from  the  N  doppler  filter- 
detector  combinations  are  calculated  by  the  following: 


\(n)  =  I  n)  xk  I  (3) 

The  maximum  v^(n)  is  automatically  selected  and  compared  to  a  predetermined 
threshold  setting  that  is  bated  on  false  alarm  considerations.  If  Vk(n)max  exceeds 
the  threshold,  a  target  detection  is  declared  at  the  corresponding  doppler  frequency  fn, 
and  the  data  Xk  is  inhibited  in  the  adaptive  algorithm  from  contributing  to  the  update 

The  MEM  Adaptive  FIR  Titter  ;5] 

In  radar  adaptive  doppler  filtering,  it  is  necessary  to  calculate  and  update 
periodically  in  order  to  account  for  changing  clutter  conditions.  The  computational 
procedure  used  by  the  MEM  adaptive  filter  algorithm  in  Figure  1  to  accomplish  this 
is  as  follows: 

The  last  coefficients  of  each  i  -th  order  prediction  filter,  Uj g  Wk),  £  -  l,  ...  , 

L,  where  L  s  N/2,  is  calculated  from  x^  and  stored.  The  procedure  for  calculating 
the  f  (k)’s  from  the  x^'s  is  given  elsewhere  [6,7J.  After  a  selected  number  of 
scans,’ k  =  m,  ....  n,  the  average  of  the  (k)'s  are  calculated  by 

n 

a  =  — a.  ,(k)  (4) 

£,£  n-m+1  Z,  £,£K 

k=-m 

for  1=  1,  ...  ,  L.  ^Next,  the  L-th  order  prediction  filter  coefficients,  aL  {'s,  are 
calculated  from  the  jj  's  by  means  of  the  recursion  ’  * 

af,i  =  a£-l,i  +  a£,£  a£-l J~i  ^ 

Rg1  is  then  calculated  from  thelr^i's  by  a  simple  algorithm  that  results  from  choosing 
L  5  N/2.  The  elements,  Zj^j,  of  the  upper  half  of  the  matrix  Rk-1>  where 
is  the  average  of  die  L-th  order  prediction  error  powers  [1],  are  calculated  by 

For  i  =  1,  ....  ,  N/2  and  j  =  i, _ _  L  +  1 

zi,j  =  aL,i-l  aL,j-l  +  zi-l,  j-1  <6a> 

where  aL>Q-l  and  zif0  =  z0>j=0. 

For  i  =  2, . N/2  and  j  =  L  +  2, - ,i  +  L 


Z.  .  =  2.  -  .  - 

1,3  i-l, 3-1 


291 


For  i  =  1,  ....  ,  N/2-1  and  j  =  i  +  1,  . . . .  , 


N/2 


(6c) 

(6d) 


Combining  (6),  (2)  and  (1)  yields  the  desireti  Wj  (n)'s. 


Performance  Analysis  of  the  MEM  ADFB 


For  examples  showing  the  performance  of  the  MEM  ADFB,  we  choose  N  «  22  and 
1,  =  15.  The  performance  is  evaluated  by  Monte-Carlo  simulation  of  radar  returns 
from  three  clutter  environments.  The  power  density  spectrum  of  the  first,  illus¬ 
trated  in  Figure  2a,  consists  of  land,  rain,  band-limited  interference  and  thermal 
noise.  The  second,  illustrated  in  Figure  6a,  consists  of  five  narrowband  point 
sources,  doppler  offset  land  clutter  and  thermal  noise.  The  third  is  thermal  noise 
only. 


The  power  density  speetrums  obtained  from  the  Yule-Walker  (YW)  equations  [3] 
using  the  first  L  =  15  lags  of  the  exact  autocorrelation  functions  obtained  from  the 
speetrums  of  Figures  2a  and  6a  are  given  in  Figures  2b  and  6b,  respectively. 


The  simulated  time  series  representing  the  radar  return  from  the  clutter  environ¬ 
ment  is  generated  by  taking  the  discrete  Fourier  transform  of  a  random  phase  line 
voltage  spectrum  corresponding  to  those  of  Figures  2a  and  6a.  The  phase  is  assumed 
to  be  independently  distributed  from  line-to-line  and  dwell- to-dwell.  Added  thermal 
noise  is  obtained  from  a  Gaussian  random  number  generator.  The  clutter  is  assumed 
stationary  over  the  period  of  adaptation  in  regard  to  the  use  of  equation  (4). 


The  optimum  response  functions  and  corresponding  SCNR's  for  selected  filters 
of  the  ADFB,  obtained  by  means  of  a  closed- formed  solution  from  the  speetrums  of 
Figures  2a  and  6a,  are  given  for  reference  in  Figures  3a,  4a,  5a  and  7a.  Similarly, 
the  responses  and  corresponding  SCNR's  for  the  YW  speetrums  are  given  in 
Figures  3b,  4b,  5b  and  7b.  The  definition  of  SCNR  is  based  on  the  assumption  of 
unity  signal  voltage  and  unity  thermal  noise  power  per  pulse  at  the  input  to  the  ADFB. 
It  should  be  noted  that  the  power  density  speetrums  and  the  filter  response  functions 
are  periodic;  one  period  is  shown. 


The  simulated  MEM  power  density  speetrums  obtained  from  one  and  two  dwells 
corresponding  to  the  exact  speetrums  of  Figures  2a  and  6a  are  given  in  Figures  2c, 
6c,  2d  and  6d,  respectively.  The  corresponding  adaptive  response  functions  and 
SCNR's  for  selected  filters  of  the  ADFB  are  given  in  c  and  d  of  Figures  3,  4,  5  and  7. 
The  figures  are  arranged  for  easy  comparison. 


The  thermal  noise  only  filter  response  functions  are  given  in  Figure  8.  Note 
here  that  the  convergence  time  is  also  fast  as  in  the  previous  examples  employing 
strong  clutter. 

In  comparing  the  results  of  the  analysis  as  given  in  Figures  2  through  8,  excellent 
results  are  obtained  after  only  one  dwell  of  a  particular  range  resolution  cell,  regard¬ 
less  of  the  clutter  environment.  However,  in  those  cases  where  the  main  lobe  of 
the  response  function  is  close  to  strong  clutter,  for  example  Figure  4,  possibly  two 
or  more  dwells  are  required  to  improve  the  SCNR  by  one  dB  or  so.  If  the  main  lobe 
of  the  adaptive  doppler  filter  is  entirely  within  heavy  clutter,  Figure  5  for  example, 
the  target  must  be  quite  large  in  order  to  be  detected. 

While  adaptation  can  occur  on  targets  as  well  as  clutter,  thus  potentially  inhibiting 
their  detection,  targets  can  be  seen  as  they  move  from  one  resolution  cell  to  the  next. 
In  addition,  target  adaptation  effects  can  be  ameliorated  by  performing  the  averaging, 
using  (4),  over  a  number  of  resolution  cells  in  deriving  the  adaptive  filter  coefficients. 

Conclusions 


This  analysis  has  shown  in  somewhat  of  a  limited  manner  the  potential  benefits 
offered  by  MEM  adaptive  doppler  filtering:  rapid  convergence  in  arbitrary  clutter 
environments  and  adaptive  response  functions  that  closely  approach  the  optimum. 

References 


fl]  J.  P.  Burg,  "Maximum  entropy  spectral  analysis,"  Ph.  D.  dissertation, 
Stanford  University,  Stanford,  CA,  May  1975. 

[2]  R.  T.  Lacoss,  "Data  adaptive  spectral  analysis  methods,"  Geophysics, 
vol.  36,  pp.  661-675,  Aug.  1971. 

[3]  T.  J.  TJlrych  and  T.  N.  Bishop,  "Maximum  entropy  spectral  analysis  and 
autoregressive  decompositions , "  Reviews  of  Geophysics  and  Space  Physics, 
vol.  13,  pp.  183-200,  1975. 

f4]  M.  Kaveh  and  G.  R.  Cooper,  "An  empirical  investigation  of  the  properties 
of  the  autoregressive  spectral  estimator, "  IEEE  Trans.  Info.  Theory,  vol. 
IT-22,  No.  3,  pp.  313-323,  May  1976. 

f5]  J.  II.  Sawyers,  "Applying  the  maximum  entropy  method  to  adaptive  digital 
filtering, "  Conf.  Rec.  Twelfth  Asilomar  Conf.  on  Circuits,  Systems  and 
Computers,  pp.  198-202,  Nov.  6-8,  1978. 

M.  Andersen,  "On  the  calculation  of  filter  coefficients  for  maximum  entropy 
spectral  analysis.  "  Geophysics,  vol.  39,  No.  1,  pp.  69-72,  Feb.  1974. 

PH  S.  Haykin  and  S.  Kesler,  "The  complex  form  of  the  maximum  entropy  method 
for  spectral  estimation,"  Proc.  IEEE,  vol.  64,  No.  5,  pp.  822-823,  May  1976. 


293 


i?  j hm  r  nK  '  u  :;i *.  r 


Figure  1.  The  Adaptive  Doppler  Filter  Bank  (ADFB)  and  Target  Detection  Logic 


i 

! 

jj  ■ 

1  LAND 

:  ;  i\ 

/  ! 

1  1  !  I 

i 

i 

!  ! 

■i 

L  ••  INTERFERENCE 

[11 


Figure  2.  (a)  The  power  density  spectrum  of  land  clutter,  rain,  band-limited  interference  and 

thermal  noise.  The  clutter-to-noise  ratio,  CNR  =  50.1  dB.  (b)  The  YW  spectrum  using  the 
first  15  lags  of  the  autocorrelation  function  derived  from  (a),  (c)  The  simulated  MEM 

spectrum  after  one  dwell,  (d)  The  simulated  MEM  spectrum  after  two  dwells 


NORMALIZED  RESPONSE  (dB)  NORMALIZED  RESPONSE  (dB) 


i  it 


-60  K (4ft- 


Ell 


: '  *7  '4 


0  0.2  0.4  0.6  0.8  l.i 


Figure  3.  The  Adaptive  Doppler  Filter  Responses  to  the  Spectrums  of  Figure  2.  f  T  =  0.37 
(a)  The  optimum  closed-form  response.  SCNR  =  14.8  dB.  (b)  The  YW  response.  SCNR  = 
14.7  dB.  (c)  The  MEM  response  after  one  dwell.  SCNR  =  14.0  dB.  (d)  The  MEM  response 
after  two  dwells.  SCNR  =  14.5  dB. 


96648-6  „  96648 


Figure  4.  The  Adaptive  Doppler  Filter  Responses  to  the  Spectrums  of  Figure  2.  fnT  =  0.5625. 
(a)  The  optimum  closed-form  response.  SCNR  =  11.9  dB.  (b)  The  YW  response.  SCNR  = 

11.2  dB.  (sy  The  MEM  response  after  one  dwell,  SCNR  =  9.4  dB.  (d)  The  MEM  response  after 
two  dwells.  SCNR  =  10.6  dB. 


T'  ■‘1*  ■ 


Figure  5.  The  Adaptive  Doppler  Filter  Responses  to  the  Spectrums  of  Figure  2.  ffiT  =  0.65625 
(a)  The  optimum  closed-form  response.  SCNR  =  -23.2  dB.  (b)  The  YW  response.  SCNR  = 
-23.5  dB.  (c)  The  MEM  response  after  one  dwell.  SCNR  =  -29.6  dB.  (d)  The  MEM  response 
after  two  dwells.  SCNR  =  -24.4  dB. 


Figure  6.  (a)  The  power  density  spectrum  of  five  narrowband  point  sources,  dopplcr  offset 
land  clutter  and  thermal  noise.  CNR  =  50  dB.  (b)  The  YW  spectrum  using  the  first  15  lags 
of  the  autocorrelation  function  derived  from  (a),  (c)  The  simulated  MEM  spectrum  after  one 

dwell,  (d)  The  simulated  MEM  spectrum  after  two  dwells. 


\  l 

5  mission  a 

j  0/  5 

^  fiome  i4ir  Development  Center  j 

g  PAPC  p£ani  aw,4  ex.ecat^  research,  dwe.lopme.nt,  tut  and  % 

**  s elected  ac.quibtti.on  programs  In  support  oi  Command,  Control  J 

6  Communications  and  Intelligence  (C3 1)  activities.  Technical  * 

!%  and  eyigineering  support  within  ar.eas  of  technical  competence  C 
fo  is  provided,  to  ESP  Program  Offices  (P04)  and  other  ESP  > 

jj  elements.  The.  principal  technical  mission  a/ieas  one  ? 

?  communications,  electromagnetic  guidance  and  control,  son.- 

a  veJJ.ia.nce  05  ground  and  aerospace  objects,  intelligence  data  <s 
collection  ana  handling,  information  system  technology,  v 

ionospheric  propagation,  solid  state  sciences,  microwave 
3^  physics  and  electronic  reliabUity ,  maint'Unability  and 
&  compatibility. 


